Computational Sciences - Master's theses
Permanent URI for this collectionhttps://laurentian.scholaris.ca/handle/10219/2096
Browse
Browsing Computational Sciences - Master's theses by Author "Chaudhary, Vikaskumar"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Prediction and survival analysis of head and neck cancer in patients using epigenomics data and advanced machine learning methods(2023-08-22) Chaudhary, VikaskumarEpigenomics is the field of biology dealing with modifications of the phenotype that do not cause any alteration in the sequence of cell DNA. Epigenomics adds something to the top of DNA to change the properties, which eventually prohibits certain DNA behavior from being performed. Such modifications occur in cancer cells and are the sole cause of cancer. The main objective of this research is to perform prediction and survival analysis of Head and Neck Squamous Cell Carcinoma (HNSCC) which is one of the biggest reasons of death and accounts for more than 650,000 cases and 330,000 deaths annually worldwide. Tobacco use, alcohol consumption, Human Papillomavirus (HPV) infection (for oropharyngeal cancer), and Epstein- Barr Virus (EBV) infection are the main risk factors associated with head and neck cancer (for nasopharyngeal cancer). Males, with a proportion ranging from 2:1 to 4:1, are slightly more affected than females. Four different types of data are used in this research to predict HNSCC in patients. The data includes methylation, histone, human genome and RNA-Sequences. The data is accessed through open-source technologies in R and Python programming languages. The data is processed to create features and with the help of statistical analysis and advanced machine learning techniques, the prediction of HNSCC is obtained from the fine-tuned model. The optimal model was determined to be ResNet50 utilizing the Sobel feature selection method for image data and ReliefF-based feature selection for clinical features, achieving a test accuracy of 97.9%. The model's precision score was 0.929, its recall score was 0.930, and its F1 score was 0.930. Additionally, the ResNet101 model demonstrated the best performance using the Histogram of Gradients feature selection method for image data and mutual information-based feature selection for clinical features, yielding a test accuracy of 96.1%. Its precision score, recall score, and F1 score were identical to the aforementioned ResNet50 model. The research also utilized Kaplan- Meier survival analysis to investigate the survival rates of patients based on various factors, including age, gender, smoking status, tumor size, and location of site. The results obtained from this analysis yielded the effectiveness of the method in providing valuable insights for risk assessment.