Science Atlantic Conference — Contributed Talks

Patrick Bowen

University: Saint Francis Xavier University Field: Computer Science Supervisor: Dr. Milton King

Fill in the blank stance detection with language models

Stance detection is a task in natural language processing that involves determining if a snippet of text is displaying an opinion that is in favour or against some topic. Automatically detecting the stance within text can assist organizations determine the opinions of people related to topics such as a product review or an action made by the organizations.

We approached this problem by tuning a pretrained language model (RoBERTa) on tweets that expressed their opinions on a specific topic. Language models are used to estimate the probability of a sequence of words based on the text they observe. By tuning a language model toward tweets that contain a stance, we are adjusting the probabilities of the language model to more closely represent the same stance. We then use this finetuned model to complete a blend of several "fill-in-the-blank", cloze-style sentences, which were used to assess stance present in the tweets. We evaluate our model on a subset of English tweets from the SemEval-2016 stance detection shared task.

Louis Bu

University: Dalhousie University
Field: Math & CS Supervisor: Dr. Robert Milson Collaborators: V. Menchions, R. Milson, D. Precioso Garcellan

Hybrid Search: Application of Zermelo's Navigation Problem
Abstract We aim to develop and test a candidate path routing algorithm for container ships on long voyages. Fuel costs compose up to 60% of the total operation costs of maritime transport, and the price of bunker fuel has doubled within the last year. A large container ship consumes around 150 metric tons of bunker fuel per day. Not only does having an efficient route helps reduce the operational costs of shipping companies, it also helps to reduce greenhouse gas emissions from container ship fleets, and improves safety and security of cargo, the cargo ships, and crews onboard. Despite the fact that sea transport is still by far the most efficient way to move cargo, it still accounts for nearly 3% of global greenhouse gas emissions. The International Maritimes Organization is imposing strict rules to halve carbon emissions by 2050.We look for these optimal paths with 2 components. First, we combine a piece-wise locally optimal paths iteratively generated and solved by the initial value problem given by Zermelo’s Navigation Problem and a simple heuristic. Next we numerically solve a boundary value problem by a parallelizable, discrete Jacobi-Newton method for path smoothing and global optimization. We implement and compute these paths by utilizing Google’s Jax(for autograd, automatic obtaining the gradient of a function by differentiation), and usual scientific computing packages for Python such as Numpy, Scipy. However, we have initial success with synthetic backgrounds on a small scale, we have yet to implement ways to avoid islands, preserved ecological zones, pirate activities, and dangerous wave/wind conditions.

David Cassagrande
University: Cape Breton University Field: Math. Supervisor: George Chen, Cape Breton University Mathematics

Blow-up solutions of the nonlinear Schrödinger equation with moving mesh methods

We consider the initial-value problem for the radially symmetric nonlinear Schrodinger equation with cubic nonlinearity (NLS) in d = 2 and 3 space dimensions and develop a very simple moving mesh method to obtain numerical solutions with large amplitude (10^60) near the blow-up point. In our scheme, two invariances, mass and energy, are well preserved.

Peter Collier

University : Dalhousie Field: Math, Stats or CS

Zero Forcing on subgraphs of Proper Interval Graphs

The zero forcing number is a graph parameter initially introduced by the AIM Minimum Rank – Special Graphs Work Group in 2007 as a lower bound for the minimum rank of a graph. Zero forcing is a type of graph infection process where a colour change rule is applied iteratively to a graph and an initial set of vertices, S. If S results in the entire graph becoming forced, we call this set a zero forcing set. The size of the smallest zero forcing set for a graph, G, is called the zero forcing number of G. In this talk, I will demonstrate minimal zero forcing sets for families of proper interval graphs, as well as some subgraphs of these graphs. From this, I am able to determine the expected zero forcing number of randomly generated subgraphs of these families of proper interval graphs

Ruoyan (Christine) Fang.

University: Dalhousie University. Field: Mathematics Supervisor: Dr. Karl Dilcher

Stern’s Diatomic Sequence and its Analogue on Z[sqrt2]

Stern's diatomic sequence (also called the Stern sequence) was originally studied in 1858. However, there are still new findings after 2020. This talk will introduce different ways to define the Stern sequence. We described a diatomic array by the analogy of the triangular array of the Pascal triangle and find a parallel relation with the Fibonacci sequence. We will also discuss the generating function and a recurrence formula for the Stern sequence. The ratios of successive terms in the Stern sequence give all the positive rationals uniquely and also form a binary tree due to Calkin and Wilf. We are going to define a map from the rationals with denominator in the form of 2^n to all rationals, which is the inverse of Minkowski's Question Mark function. We will also discuss an analogue of the Stern sequence in Z[sqrt2], which has similar properties as the original Stern sequence.

Lauren Farrell

University: Mount Allison University Field: Math Supervisor: Dr. Matt Betti Collaborator: Dr. Jane Heffernan

A Pair Formation Model with Recovery of Monkeypox

Monkeypox is a disease which spreads through close prolonged contact with an infected individual, similarly to a sexually transmitted infection. However, the recent global outbreak of monkeypox is unique because spread is mainly concentrated in men who have sex with men and infected individuals can recover with lifetime immunity. This novel situation can be modeled by combining a pair formation model, which is generally used to model STI spread, with an SIR model.

Wangwei Han

University Name: University of Prince Edward Island Field: CS Supervisor: Dr. Antonio Bolufe-Rohler

Machine Learning for Parameter Tuning, An Application to Differential Evolution

Metaheuristics are characterized by their ability to find sufficient solutions for very hard optimization problems. Algorithms such as Differential Evolution are currently state of the art in many fields, however, the performance of Differential Evolution is strongly influenced by the chosen values of its parameters. The most relevant parameters in Differential Evolution are the size of the population, the crossover probability and the mutation factor. In this research, we present a novel way of tuning these parameters using Machine Learning techniques. We collect data characterizing the optimization process and associate it to the result of modifying each parameter independently. We use this information to train several classification models on how to adjust each parameter. The trained models are then used to adjust the parameters after each execution of Differential Evolution. Computational results using the CEC’13 benchmark suite, show that this approach is very effective and leads to a significant improvement in performance.

Xiaoyu Jia

University: Dalhousie university Field: Mathematics Supervisor: Karl Dilcher

Modular Forms and Convolution Sums of the Divisor Function

In a paper published in 1916 by Ramanujan, he derived some exceptional results on certain types of functions. It is astonishing that the modular forms, whose theory was developed much later, were actually special cases of these functions. While the space of modular forms was well characterized, the space generated by some “almost-modular forms” remain mysterious. We will show some empirical results to demonstrate that these “almost-modular forms” are far from problematic and actually make the theory richer.

Matthew Kozma

University: University of Prince Edward Island Field: CS Supervisor: Dr. Campeanu

An Analysis and Overview of Deterministic Finite Automata with Output

Peilin Li

University: Dalhousie university. Field: Math Supervisor: Dr. Sara Faridi

The interaction between simplicial collapsing and Betti numbers of monomial resolution.

My talk will be a gentle introduction to simplicial collapsing – making a simplicial complex smaller by deleting some faces – and how it interacts with free resolutions of monomial ideals. Simplicial complexes are well used in Topology and free resolutions are used in Commutative Algebra to encode relations between polynomials. A minimal free resolution of an ideal produces a sequence of integers called “Betti numbers”. In my talk, I will show how one can enlarge a monomial ideal, one generator at a time, and keep track of the Betti numbers at the same time using simplicial collapses.

Alastair May

University: Saint Francis Xavier University Field: Computer Science Supervisors: Dr. Taylor Smith, Dr. Milton King

Creating visual representations of finite state automata

A finite automaton is an abstract model of computation that accepts or rejects strings of characters based on a finite sequence of computation steps. Visually, a finite automaton consists of a set of nodes that are called states, and transitions between states. This presentation about finite automata will present my work on adding functionality to a software named Grail. Grail allows you to interface with finite automata and perform various useful functions on them such as converting them to and from regular expressions. The new functionality I am adding is the ability to generate visual diagrams when given text files containing the specifications of automata produced by Grail.

Zachary Murray

University: Dalhousie University Field: Math Supervisors: Dr. Dorette Pronk, Dr. Martin Szyld

Implementing Double Categories in the Lean Proof Assistant
In informal mathematics, it is generally easy to switch between equivalent forms of a definition on the fly, per our needs. Such is often not the case for mathematics in a proof assistant, where the particular requirements for expressing mathematics can compel users to favour some kinds of definitions over others.

We have worked to implement double categories in the Lean proof assistant in two main styles: as categories internal to Cat and in the more specific square-oriented fashion. We will examine our various attempts at implementing these definitions, how we have adjusted them, and which may ultimately be a better fit for Lean.

Saurav Neupane

University name: University of Prince Edward Island (UPEI) Field: Stats Supervisor: Dr. Michael A. McIsaac

Analysis on the factors affecting access to in-school music education in Canadian adolescents.

Music education is an integral part of the elementary curriculum in Canada. Music/arts education is mandatory from Grades K-5 in all provinces. However, it is unknown if all students have equal access to in-school music education after Grade 5. School demographics, funding, and resources pose challenges in providing Canadian adolescents with proper access to in-school music education. The project's motive is to identify gaps in the available literature on the relationship between variables representing social location (age, gender, race, geographic location, etc.) and access to in-school music education and the impacts of in-school music education on Canadian adolescents. We will then attempt to address these gaps using three unique data sets - the National 2020 Music Education Survey, Health Behaviour in School-Aged Children Survey (HBSC), and the MusiCount Band-Aid Program. The National 2020 Music Education Survey is a historic study on the state of Music Education in Canada, launched in collaboration with six national organizations. The HBSC is a World Health Organization, on-going, cross-national research study of youth aged 11 to 15 years old that collects data every 4 years to gain insight into young people's well-being, health behaviors, and social contexts. In Canada, HBSC collects anonymous data from students in grades 6 to 10. The MusiCounts Band Aid Program provides under-resourced schools with grants of up to $15,000 worth of musical instruments, equipment, and resources. I have begun analyzing the National 2020 Music Education Survey to provide the relationship between various factors and access to in-school music education across Canada.

Dylan Pearson

University: Mount Allison Field: Math. Supervisor: Dr. Margaret Messinger Collaborators: Danny Dyer, Melissa Hugan

Slow Localization

Localization is a turn-based pursuit-evasion game played on a graph where one player controls a set of cops, and another player controls a robber. The cops are not aware of the robber's location and attempt to locate them via distance queries. Both the cops and the robber can move to new vertices or stay put on their turn, with the cops being able to move to any vertices while the robber's movement is restricted to adjacent vertices. A variation of the localization game is studied where the cops are restricted to moving to adjacent vertices on their turn. The minimum number of cops required to locate the robber is called the slow localization number. We compare the slow localization number with the localization number on different graph classes and determine the slow localization number on caterpillars, wheels and cocoons.

Logan Pipes

Institution: Mount Allison Field: Math Supervisor: Nathaniel Johnston Collaborators: Nathaniel Johnston

Bounding Real Tensor Optimizations via the Numerical Range

Numerous bounds on a certain optimization problem over product tensors are discussed, with focus on utilizing the numerical range of a matrix. This bound is equal to the one attained by a common semidefinite relaxation technique but can be implemented without running any semidefinite programs.

Alexander Saunders

University: Saint Mary's University Field: Math Supervisor: Dr. Manuela Girotti

Tiling and Asymptotics of the Aztec Diamond
In this talk we consider the Aztec Diamond. The Aztec Diamond of order n, introduced by N. Elkies, G. Kuperberg, M. Larsen, and J. Propp in 1992, is formed by taking n rows with 2,4, . . . ,2n cells, stacking them on top of each other to form a staircase centred about they-axis, and then reflecting this staircase across the x-axis. This structure is of combinatorial interest when randomly tiled by 2×1dominoes: using the Lindstr ̈om–Gessel–Viennot lemma and a connection with orthogonal polynomials, it is possible to exactly describe all the probabilistic quantities of the model and count the total number of possible tilings of the diamond. Furthermore, for largen the Aztec Diamond is partitioned into five distinct regions by a tiling (a centre “temperate” region and four outer “polar” regions) and the centre region becomes arbitrarily close to a deterministic shape. The fluctuations of the boundary (referred to as the “Arctic Circle”) can be analyzed with Random Matrix Theory tools. More precisely, the fluctuations of a point on the Arctic Circle are given by the Tracy–Widom distribution and they are universal. In this talk, we will see an overview of the above classical results and some of the theory that supports them. Code will also be presented that generates random tilings of an Aztec Diamond of any size such that we can observe the Arctic Circle arising.

Crystal Sharpe

University name: Mount Allison University Field: CS Supervisor : Laurie Ricker Collaborators: Hervé Marchand

Mutual Opacity between Multiple Adversaries

We investigate opacity, an information-flow privacy property, in a setting where there are two competing agents or adversaries whose objective is to hide their secrets and expose the secrets of the other agent. Each agent has only partial information about the state of the system, where the system and agents are modelled as finite automata. The agents can achieve their objective by enabling or disabling events from their set of controllable events. We examine two different scenarios. In the first problem, the agents are passive with no control capabilities, and we seek a global controller to enforce their mutual opacity. In the second problem, the formerly passive agents are autonomous and have control capabilities. We seek the plausibility of two controllers, one for each agent, to see if we can synthesize a winning control strategy so that one adversary can always discover the secrets of the other without revealing its own.

Benjamin Stanley
University: Memorial University of Newfoundland Field: Math Supervisor: Dr. David Pike

A Survey of the 2-Block Intersection Graphs of Twofold Triple Systems
Twofold Triple Systems are a type of balanced incomplete block design with
interesting and potentially useful structure. This structure can be
represented and analyzed by constructing 2-Block Intersection Graphs out
of the Twofold Triple Systems. We asked the question of whether there
exist large examples of such graphs that also have other interesting
properties, such as planarity or vertex-transitivity. To this end, we
surveyed nearly 8.5 billion such graphs, testing for these and other
properties. We identified a number of previously known graphs as well as
large new notable graphs that may warrant further study.

Yuchen Wei

University: Saint Francis Xavier University Field: Computer Science Supervisor: Dr. Milton King

Short temporal word sense disambiguation

Word sense disambiguation (WSD) is a task in natural language processing that involves assigning a definition-like entity, known as a sense, to a word in context. Recently, WSD models were shown to benefit from considering the author’s preferred senses. For example, a sport enthusiast would more likely use sport-related sense of the word court, instead of the legal-related sense. We extend this idea by considering when the text was written with respect to a short temporal scale (i.e. days, weeks, seasons). Our preliminary experiments on a set of English blog posts showed that different temporal values (i.e. different days of the week) prefer different senses of the same word. We found that we were able to achieve a better performance with a state-of-the-art WSD model when temporal information is considered. We analyze and show the impact that each temporal period has on our model. Furthermore, we augment our temporal-based method with knowledge of the author’s preferred senses to generate a personalized temporal-based WSD model.

Grace Wilson

University: University of Prince Edward Island Field: Math Supervisor: Dr. Shannon Fitzpatrick

Eviction on Triangulated Grids

A study of eternal domination on m x n triangular grids, under the eviction model in which if a vertex in the dominating set, known as a guard, is “attacked” the guard must move to an adjacent vertex. Provides a summary of results and bounds on additional general results.

Jingyuan (Crystal) Zhang

University. Dalhousie Field: Statistics and Actuarial Science Supervisor: Dr. Toby Kenney

Healthcare fraud and abuse detection using machine learning

Fraud and abuse are two of the most well-known challenges in the healthcare insurance industry. They can generate a significant number of financial losses, along with hurting the healthcare system by undermining the quality of service provided to legitimate patients. Therefore, effective fraud and abuse detection methods are essential to healthcare-related parties. Bayerstadler et al., 2016, use Bayesian multinomial latent variable modelling for fraud and abuse detection in health insurance. Their sampling processes are limited by a model calibration bias that affects model fitting and causes a lower predictive performance. This paper proposes a data framework to preprocess clinical data from the Centers for Medicare & Medicaid Services (CMS) and the List of Excluded Individuals and Entities (LEIE) to train methods for fraud detection. We use the processed data to train a weighted subspace random forest model for predicting fraud. We identify a potential issue with the data - since the List of Excluded Individuals and Entities (LEIE) data only includes individuals who were convicted of fraud, there are almost certainly fraudulent claims that were not detected in the data. We therefore perform a simulation to study this issue by randomly mislabeling some of the fraud cases in the data set as non-fraudulent. This allows us to assess the effect of this mislabeling. The result shows that mislabeled observations do impact the performance of the model, but the effect is not so large. We are therefore confident that our fitted model has not been excessively affected by this issue.