Computational Sciences - Master's theses

Permanent URI for this collectionhttps://laurentian.scholaris.ca/handle/10219/2096

Browse

Now showing 1 - 20 of 112

The 0 -1 multiple knapsack problem
(2017-05-31) Shamakhai, Hayat Abdullah
In operation research, the Multiple Knapsack Problem (MKP) is classified as a combinatorial optimization problem. It is a particular case of the Generalized Assignment Problem. The MKP has been applied to many applications in naval as well as financial management. There are several methods to solve the Knapsack Problem (KP) and Multiple Knapsack Problem (MKP); in particular the Bound and Bound Algorithm (B&B). The bound and bound method is a modification of the Branch and Bound Algorithm which is defined as a particular tree-search technique for the integer linear programming. It has been used to obtain an optimal solution. In this research, we provide a new approach called the Adapted Transportation Algorithm (ATA) to solve the KP and MKP. The solution results of these methods are presented in this thesis. The Adapted Transportation Algorithm is applied to solve the Multiple Knapsack Problem where the unit profit of the items is dependent on the knapsack. In addition, we will show the link between the Multiple Knapsack Problem (MKP) and the multiple Assignment Problem (MAP). These results open a new field of research in order to solve KP and MKP by using the algorithms developed in transportation.
The (a, b, r) class of discrete distributions with applications
(2020-09-29) Yartey, Esther
In the insurance field the number of events such as losses to the insured or claims to the insurance company are an important aspect of loss modeling. Understanding the size of claims in terms of numbers and amounts makes it possible to modify and address issues related to creating insurance contracts. In general, certain counting (or discrete) distributions are used to model the number and amount of claims. There are situations where the modelled probability of having no claim is high. Indeed this is a desirable case for the benefit of insurance companies. An approach in modeling the number of claims in this case is by using Panjer’s (a, b, 1) class of discrete distributions. In this thesis, we look at a more general case of this class of distributions where there is an excess of claims at 0 to say r. We modify the existing (a, b, 1) model by assigning values greater than 0 to p0 (the probability of no claims) all the way up to pr (the probability of r claims). We then analyze this new model in terms of goodness of fit to actual claim data and compare with the classical (a, b, 1) and (a, b, 0) class of discrete distributions. This is done by using the maximum likelihood estimate (MLE) in estimating the parameters of each distribution discussed. In addition, the Akaike information criterion (AIC) is used to choose between competing distributions. This new model will be called (a, b, r) class of distributions, where r > 1.
Aerial image segmentation
(2016-07-25) Althwaini, Abdulkareem Ali
Image segmentation plays a vital role in applications such as remote sensing. For this example, remote sensing, aerial image segmentation is a special case of image segmentation. There are some unique features of aerial images, like noise in natural landscapes, which need to be addressed in order to obtain an optimal solution. Bushes and rocks are examples of landscape features with diverse and variable pixel values that need to be distinguished by the segmentation process. Smoothing filters present a common solution to address the problem of noise in images, as does aerial image segmentation. There are several image segmentation techniques used for aerial image segmentation. Some of these techniques are more sensitive to noise problems, and are necessary to discriminate between different smoothing filters. In this thesis, a number of different aspects of aerial image segmentation and their solutions are explained. In addition to this, a novel smoothing filter is introduced and compared with other methods using different segmentation techniques. Finally, all of the previous points are applied to a real world problem.
Air pollutant forecasting using deep learning
(2021-09-16) Ketul, Dave
In nearly every country, air pollution has become a serious issue. whether it is developing countries or developed countries, as urbanization and industrialization hasincreased. Governments and citizens are greatly concerned about air pollution, which has a negative impact on human health, the well-being of all life forms, and global economic development. Numerical data is used in traditional air quality forecast systems, which necessitates more computing resources for pollutant concentration measurement and yields poor results. We used a commonly used deep learning model to solve this problem. Particulate Matter 10 was the pollutant studied in this study (PM10). This study examines the methods and techniques for predicting air quality using Deep Learning. Various deep learning models have been investigated. This research incorporates a recurrent neural network (RNN), a long short-term memory (LSTM), a gated recurrent unit (GRU) and a bidirectional long short-term memory combination for forecasting. The dataset is primarily comprised of pollution and meteorological time series data from AirNet China and the United States Environmental Protection Agency. We studied various architectures and their variations in topologies and model parameters in order to decide the best architecture. The Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE) were used to assess the models (MAPE). Each experiment was run for up to 1000 epochs by varying the learning rate, the number of nodes in a layer, and the total number of hidden layers. All models performed admirably in terms of prediction, according to the results. For AirNet dataset GRU based architecture produced best outcome while for EPA dataset LSTM based architecture outperformed other models.
An analysis of claim frequency and claim severity for third party motor insurance using Monte Carlo simulation techniques
(2019-08-22) Dumais, Cedric
The purpose of this thesis is to introduce the reader to Multiple Regression and Monte Carlo simulation techniques in order to find the expected compensation cost the insurance company needs to pay due to claims made. With a fundamental understanding of probability theory, we can advance to Markov chain theory and Monte Carlo Markov Chains (MCMC). In the insurance field, in particular non-life insurance, expected compensation is very important to calculate the average cost of each claim. Applying Markov models, simulations will be run in order to predict claim frequency and claim severity. A variety of models will be implemented to compute claim frequency. These claim frequency results, along with the claim severity results, will then be used to compute an expected compensation for third party auto insurance claims. Multiple models are tested and compared.
An analysis of lung cancer survival using multi-omics neural networks
(2022-01-27) Naik, Krinakumari
A key goal of precision health medicine is to improve cancer prognosis. Despite the fact that numerous models can forecast differential survival from data, progressive algorithms that can assemble and select important predictors from progressively complex data inputs are urgently required. As a result, these models should be capable to provide more information about which types of data are most significant for improving prediction. Because they are adaptable and account for data density in a non-linear manner, deep learning-based neural networks may be a feasible solution for both difficulties. In this study, we use Deep Learning-based networks to get how gene expression data predicts Cox regression survival in lung cancer. SALMON (Survival Analysis Learning with Multi-Omics Neural Networks) is an algorithm that collects and simplifies gene expression data and cancer biomarkers in order to enable prognosis prediction. When more omics data was comprised in model construction, the results (concordance index = 0.635 and log-rank test p-value = 0.00881) showed that performance improved. We employ eigengene modules from the results of gene co-expression network analysis as model inputs in its place of raw gene expression principles. This algorithm verified specific mRNA-seq co-expression modules and clinical information, which show crucial roles in lung cancer prognosis, revealing various biological functions by exploring how each contributes to the hazard ratio. SALMON also performed well compared to other Deep Learning Survival prognosis models.
Analyzing impact on bitcoin prices through Twitter social media sentiments
(2022-04-28) Patel, Jay
Many cryptocurrencies exist in today's date, and many more are on the verge of being brought into circulation. It is a form of a digital currency but instead of being run by a centralized authority and government, it is a decentralized structure that is created using blockchain technology. These currencies are highly influential and unpredictable with their factors of influence ranging high and low all over the world. This research revolves around the most well-renowned cryptocurrency which is Bitcoin. The focus here is on the discussion around the relationship of bitcoin with the prominent online media platform called Twitter. Twitter has been taking part in the discussion of almost all major as well as related incidents and events all around the world. It is a social media platform that is informative as well as useful for the public so much, that even major personalities, as well as politicians, take to the platform in order to express their views quickly on an important matter. The research included firstly gathering the tweets and was divided into two parts - Verified and Non-Verified users and then a cleaning process was done on the data to make sure that only the desired and necessary data if left for further research. The tweets regarding bitcoin were analyzed and utilized for a deeper observation so that the sentiment can be extracted and can be visualized against the bitcoin prices to derive a conclusion regarding the relationship between Twitter and Bitcoin prices. The analysis returned a lot of insights as well as inference relating to the influence that the Bitcoin prices and related tweets have on each other. The results of the report mention the outcome of the analysis that was found stating the original hypothesis to be true or not
Application of advanced diagonalization methods to quantum spin systems.
(Laurentian University of Sudbury, 2014-05-13) Wang, Jieyu
Quantum spin models play an important role in theoretical condensed matter physics and quantum information theory. One numerical technique that is frequently used in studies of quantum spin systems is exact diagonalization. In this approach, numerical methods are used to find the lowest eigenvalues and associated eigenvectors of the Hamilton matrix of the quantum system. The computational problem is thus to determine the lowest eigenpairs of an extremely large, sparse matrix. Although many sophisticated iterative techniques for the determination of a small number of lowest eigenpairs can be found in the literature, most exact diagonalization studies of quantum spin systems have employed the Lanczos algorithm. In contrast to this, other methods have been applied very successfully to the similar problem of electronic structure calculations. The well known VASP code for example uses a Block Davidson method as well as the residual-minimization - direct inversion of the iterative subspace algorithm (RMM-DIIS). The Davidson algorithm is closely related to the Lanczos method but usually needs less iterations. The RMM-DIIS method was originally proposed by Pulay and later modified by Wood and Zunger. The RMM-DIIS method is particularly interesting if more than one eigenpair is sought since it does not require orthogonalization of the trial vectors at each step. In this work I study the efficiency of the Lanczos, Block Davidson and RMM-DIIS method when applied to basic quantum spin models like the spin-1/2 Heisenberg chain, ladder and dimerized ladder. I have implemented all three methods and are currently applying the methods to the different models. In our presentation I will compare the three algorithms based on the number of iterations to achieve convergence, the required computational time. An Intel's Many-Integrated Core architecture with Intel Xeon Phi coprocessor 5110P integrates 60 cores with 4 hardware threads per core was used for RMM-DIIS method, the achieved parallel speedups were compared with those obtained on a conventional multi-core system.
Applying teeline shorthand using leap motion controller
(2017-02-08) Zang, Weikai
A hand gesture recognition program was developed to recognize users’ Teeline shorthand gestures as English letters, words and sentences using Leap Motion Controller. The program is intended to provide a novel way for the users to interact with electronics by waving gestures in the air to input texts instead of using keyboards. In the recognition mode, the dynamic time warping algorithm is used to compare the similarities between different templates and gesture inputs and summarize the recognition results; in the edit process, users are able to build their own gestures to customize the commands. A series of experiment results show that the program can achieve a considerable recognition accuracy, and it has consistent performance in face of different user groups.
Augmented reality based indoor navigation using point cloud localization
(2021-06-24) Patel, Vishva
People of various ages may find it difficult to navigate complex building structures as they become more prevalent. The future belongs to a world that is artificially facilitated, and Augmented Reality will play a significant role in that future. The concept of Indoor Navigation using a smartphonebased Augmented Reality technology is explored in this research. Using readily available and affordable tools, this study proposes a solution to this issue. We built an Augmented Reality-based framework to assist users in navigating a building using ARWAY, a software development toolkit. To find the shortest paths, we used the Point Cloud Localization and A* pathfinding algorithms. A shop inside a shopping centre, a particular room in a hotel, and other locations can be easily located using this app, and the user is given fairly precise visual assistance through their smartphone to get to his desired spot. The proposed framework is based on augmented reality, and point clouds are the most important components. The application allows the user to choose their desired destination as well as change their destination at any time. To find the results from the technical, subjective, and demographic responses, we used hypothesis testing and validation with statistical analysis and exploratory data analysis methods.
BERT-based multi-task learning for aspect-based sentiment analysis
(2022-01-20) Bhagat, Yesha
The Aspect Based Sentiment Analysis (ABSA) systems aims to extract the aspect terms (e.g., pizza, staff member), Opinion terms (e.g., good, delicious), and their polarities (e.g., Positive, Negative, and Neutral), which can help the customers and companies to identify product weaknesses. By solving these product weaknesses, companies can enhance customer satisfaction, increase sales, and boost revenues. There are several approaches to perform the ABSA tasks, such as classification, clustering, and association rule mining. In this research we have used a neural network-based classification approach. The most prominent neural network-based methods to perform ABSA tasks include BERT-based approaches, such as BERT-PT and BAT. These approaches build separate models to complete each ABSA subtasks, such as aspect term extraction (e.g., pizza, staff member) and aspect sentiment classification. Furthermore, both approaches use different training algorithms, such as Post-Training and Adversarial Training. Moreover, they do not consider the subtask of Opinion Term Extraction. This thesis proposes a new system for ABSA, called BERT-ABSA, which uses MultiTask Learning (MTL) approach and differentiates from these previous approaches by solving all three tasks such as aspect terms, opinion terms extraction, and aspect term related sentiment detection simultaneously by taking advantage of similarities between tasks and enhancing the model’s accuracy as well as reduce the training time. To evaluate our model’s performance, we have used the SemEval-14 task 4 restaurant datasets. Our model outperforms previous models in several ABOM tasks, and the experimental results support its validity
Biocybernetic closed-loop system to improve engagement in video games using electroencephalography
(2022-01-06) Klaassen, Stefan
The purpose of this paper was to determine the level of engagement with a specific stimuli while playing video games. The modern video game industry has a large and wide audience and is therefore becoming more popular and accessible to the public. The interactions and rewards offered in video games are a key to keep player engagement high. Understanding the player’s brain and how it reacts to different type of stimuli would help to continue improving games and advance the industry into a new era. Although studying human engagement had started many years ago, the application of measuring it in video game players has only been applied more recently and is still an evolving field of research. This thesis will be taking an objective approach by measuring engagement through electroencephalogram (EEG) readings and seeing if it will help improve current dynamic difficulty adjustment (DDA) systems for video games leading to more engaging and entertaining games. Although statistically significant findings were not found in this experiment, the technique for future experiments were laid out in the form of classifiers comparison and program layouts.
#BlackLivesMatter Movement and consequences of racism: a data and sentiment analysis on Tweets in the USA
(2021-03-17) Zolfaghari, Amir Hossein
Introduction: A movement arose in the middle of a challenging pandemic time. In a year that everybody keeps their six feet distance and mask on, many came to the streets or started publishing social media contents asking for Black rights. It was after an injustice killing of a black man - George Floyd - by a police officer that #BlackLivesMatter trended again as the top conversation in the world. Hence, it became our question that how racism - specifically on social media - is associated with blacks' mattering lives. Methods: We carried out an ecological retrospective study on Twitter data for the year 2020, which had location tags inside the USA. We created inclusion criteria to shape our dataset based on that and categorizing tweets into separate groups. Our groups were (1) "BLM" for those supporting the "BlackLivesMatter" movement; (2) "Anti-BLM" containing tweets in opposition to the first group; (3) "Ambiguous" who had both previous group contents; and (4) the "Racists" comprising those who included offensive n-words in their tweets. We employed some statistical data by utilizing previous research for the "Life Expectancy", "Poverty Rates," "Educational Attainment," and "Race Compositions” factors of the black and white population in the USA by the states. We employed additional techniques to identify genders and classify records in reference to their states. Moreover, we applied the sentiment analysis using Python. We calculated the final rates considering each group's statistics compared to the sum of all tweets published in each state. The analysis of the final rates in correlation with employed tables was done by IBM SPSS Statistics 26. Results: We found 43,830,301 tweets with location data inside the USA in this time frame, and 306,925 of them applied for our study. A noticeable initial observation was the sharp increase of the #BlackLivesMatter after George Floyd's demise on May 25, 2020, while this hashtag has a history back to 2013. There is a positive correlation between the rates of offensive-content tweets and the life expectancy of Black males. The same tweets showed an association that wherever racism is higher, more are suffering poverty. This is rather surprising that the BlackLivesMatter movement supporters were mostly among those with the bachelor or advance degree educational attainments. By contrast, if a state had lower rates of high school degrees, more racists tweets exist there. The rates of aggressive tweets are higher in areas with more black populations and are weaker in states having white people's domination. Regarding the sentiment analysis, the majority of tweets are written in objective forms, and it had a slight increase after the mentioned event. The polarities were also mostly in a neutral way. The most negative sense belonged to the BLM supporters, with the rate of 46% before and 33% after the event. Conclusion: This project was undertaken to evaluate the relationship between rates of cyberracism and anti-racism posts to some real-world indicators. We considered our inclusion criteria in reference to the cruel killing of a black man – George Floyd - by police to investigate the published tweets classified by supporters and opponents of this story, in addition to those using offensive language towards black people. This study showed a strong correlation between these concepts while contents on the world wide web could impute the day-to-day life conversation. Hence, it shows how a drop in racist behaviors can lead to a world with higher life expectancy, wealth, and education. Reduction of color discrimination on social media and particularly toward blacks could help to have a healthier community. Contrarily, the rise of these bigoted contents results in disastrous consequences on these racialized populations.
Classification approaches for microarray gene expression data analysis
(2015-03-13) Almoeirfi, Makkeyah
The technology of Microarray is among the vital technological advancements in bioinformatics. Usually, microarray data is characterized by noisiness as well as increased dimensionality. Therefore, data, that is finely tuned, is a requirement for conducting the microarray data analysis. Classification of biological samples represents the most performed analysis on microarray data. This study is focused on the determination of the confidence level used for the classification of a sample of an unknown gene based on microarray data. A support vector machine classifier (SVM) was applied, and the results compared with other classifiers including K-nearest neighbor (KNN) and neural network (NN). Four datasets of microarray data including leukemia data set, prostate dataset, colon dataset, and breast dataset were used in the research. Additionally, the study analyzed two different kernels of SVM. These were radial kernel and linear kernels. The analysis was conducted by varying percentages of dataset distribution coupled with training and test datasets in order to make sure that the best positive sets of data provided the best results. The 10-fold cross validation method (LOOCV) and the L1 L2 techniques of regularization were used to get solutions for the over-fitting issues as well as feature selection in classification. The ROC curve and a confusion matrix were applied in performance assessment. K-nearest neighbor and neural network classifiers were trained with similar sets of data and comparison of the results was done. The results showed that the SVM exceeded the performance and accuracy compared to other classifiers. For each set of data, support vector machine was the best functional method based on the linear kernel since it yielded better results than the other methods. The highest accuracy of colon data was 83% with SVM classifier, while the accuracy of NN with the same data was 77% and KNN was 72%. Leukemia data had the highest accuracy of 97% with SVM, 85% with NN, and 91% with KNN. For breast data, the highest accuracy was 73% with SVM-L2, while the accuracy was 56% with NN and 47% with KNN. Finally, the highest accuracy of prostate data was 80% with SVM-L1, while the accuracy was 75% with NN and 66% with KNN. It showed the highest accuracy as well as the area under curve compared to k-nearest neighbor and neural network in the three different tests.
Collaborative filtering recommender system for predicting drugs for prostate cancer
(2021-06-18) Patel, Vishwaben
Prostate cancer is a common type of cancer found in men. Identifying drug targets and inhibitors in drug designing is a challenging task. The Recommender systems (RSs) are regarded as a useful tool and are further considered as optimistic method. The use of tool reflects unprecedented growth and development and a tremendous impact on e-commerce. In the research work, for making the prediction in context of cancer activity class (active/inactive) for compounds extracted from ChEMBL, the RS Methods was used. There are two RS approaches that: Collaborative filtering and Content-based Filtering. From these approaches Collaborative Filtering is applied and successfully conducted the investigation and evaluation for making effective prediction over classes for compounds. In the conducted research the interactions among some of the compounds are known. Further this way prediction of interaction profiles could be conducted. The gathered result from classification is considered as relatively good prediction and maintains the quality. Then we applied various regression techniques on data set which are Lasso, EN (Elastic Net), CART (Classification and regression trees), KNN (k-nearest neighbors), SVR (Support vector regression), RFR (Random forest regression), GBR (Gradient boosting regression) and ETR (Extra tree regression). After analyzing the data set with regression techniques, we compare their results and then we get best results from SVR technique and this technique can be used to find compounds to fight against prostate cancer in lesser time with more efficiency.
A comparative study of D2L's Performance with a purpose built E-learning user interface for visual- and hearing-Impaired students
(2014-08-29) Farhan, Wejdan
An e-learning system in an academic setting is an efficient tool for all students especially for students with physical impairments. This thesis discusses an e-learning system through the design and development of an e-learning user interface for students with visual- and hearing- impairment. In this thesis the tools and features in the user interface required to make the learning process easy and effective for students with such disabilities have been presented. Further, an integration framework is proposed to integrate the new tools and features into the existing e-learning system Desire-To-Learn (D2L). The tools and features added to the user interface were tested by the selected participants with visually-and hearing- impaired students from Laurentian University’s population. Two questionnaires were filled out to assess the usability methods for both the D2L e-learning user interface at Laurentian University and the new e-learning user interface designed for students with visual and hearing impairment. After collecting and analyzing the data, the results from different usability factors such as effectiveness, ease of use, and accessibility showed that the participants were not completely satisfied with the existing D2L e-learning system, but were satisfied with the proposed new user interface. Based on the new interface, the results showed also that the tools and features proposed for students with visual and hearing impairment can be integrated into the existing D2L e-learning system.
A comparative study on traffic collisions severity using machine learning approaches
(2021-06-02) Rathod, Rajvi
Road Traffic collisions and congestion are amongst one of the most crucial issues in the modern world. Every year, traffic collisions cause multiple deaths and injuries. It leads to economic losses as well. According to WHO, approximately 1.35 million people are losing their lives, with 20 to 50 million people face non-severe injuries every year because of road collisions. Hence, there is a need to create a prediction system that can help determine relations between various factors such as climate, types of automobile, driving pattern etc., to predict the severity of the collisions. It helps to improve public transportation, allowing safer routes and thus avoid the chances of high severity cases to make the roads safer. Smart cities concept can be helpful to handle modern problems. Accurate Models for predicting collision severity has become a significant challenge for transportation systems. This research establishes a procedure for identifying important parameters affecting collision severity and creates a relationship between human and environmental factors using several Machine Learning (ML) techniques. Among different types of ML techniques, classification algorithms have been applied for categorizing the level of severity. Supervised algorithms such as Random Forest (RF), Decision Trees (DT), Logistic Regression and Naïve Bayes have been used. A comparative study among performance and accuracies of various algorithms is also mentioned. These algorithms were tested on a dataset that contains historic data for collisions in the U.S and their severity levels. This study's findings show Random Forest with the best accuracy and identify the time of day, duration of an collision, and Point of Interest (POI) features as the influential parameters.
Comparing nurse performance between an infusion pump medical device on differing mediums
(2019-07-31) Doan, Amy
Medical devices are pervasive in all healthcare environments and the means in which healthcare professionals interact with medical device user interfaces is of interest to the researcher. In this research study, an intravenous infusion pump model used most frequently in the researcher’s local environment was observed against a sample of the local nursing population. This research study aimed to determine the suitability of established user interface evaluations and analytical modelling laws to predict nurse performance when interacting with the selected medical device user interface. This research study also aimed to observe and compare any changes in performance times and reported cognitive loads for the same user interface of a selected medical device on two different mediums; the actual medical device and a simulated user interface mock-up on a handheld tablet device. This research study concluded that the differences in performance task times and reported cognitive load between both mediums was minor and not statistically significant. When evaluating the estimated performance task times generated through usability evaluations and analytical modelling laws, this research study concluded that although the estimated times were similar to the performance time averages of the whole sample, these estimates are not reliable to predict individual expected task times. Additionally, this research study highlighted how additional factors such as performing safety checks, and the user’s individual duration to complete these safety checks influences the time required to complete a task.
Computer-interpretable guidelines using GLIF with Windows workflow foundation
(Laurentian University of Sudbury, 2014-09-22) Minor, Ryan
Modern medicine is increasingly using evidence based medicine (EBM). EBM has become an integral part of medical training and ultimately on practice. Davis et al. [6] describe the “clinical care gap” where actual day-to-day clinical practice differs from EBC, leading to poor outcomes. This thesis researches the GLIF specification and implements the foundation for a GLIF based guideline system using Windows Workflow Foundation 4.0. There exists no public domain computer implementable guideline system. The guideline system developed allows a guideline implementer to create a guideline visually using certain medical related tasks, and to test and debug them before implementation. Chapter 5 of this thesis shows how to implement a guideline called Group A Streptococcal Disease Surveillance Protocol for Ontario Hospitals which is of fundamental importance for Ontario hospitals. The workflow approach allows developers to create custom tasks should the need arise. The Workflow Foundation provides a powerful set of base classes to implement clinical guidelines.
Condition monitoring of a fan using neural networks
(Laurentian University of Sudbury, 2015-02-24) Zhang, Bo

Browse

Browsing Computational Sciences - Master's theses by Title