Survival analysis and prediction of lung cancer in patients based on clinical and image features using machine learning

dc.contributor.advisorDr. Kalpdrum Passi
dc.contributor.authorChhetri, Kiran
dc.date.accessioned2025-01-22T18:51:13Z
dc.date.available2025-01-22T18:51:13Z
dc.date.issued2023-01-15
dc.description.abstractLung cancer develops in lung tissues, most commonly in the cells that line the airways. It is the leading cause of death from cancer in both men and women. To estimate the prevalence of lung cancer in the coming years, it is necessary to diagnose it in the early stages. This thesis work proposes to perform a reliable diagnosis of patients with lung cancer. The goal of this research is to analyze the important variables impacting lung cancer based on p-value using image features as well as clinical data and is focused on quality analysis. Further, to enable early diagnosis of cancer with high efficiency, this work proposes to classify the patient’s images into cancer using a Convolutional Neural Network (CNN) to enable its early diagnosis. The thesis discusses the dataset, data pre-processing steps, survival rate risk analysis, classification, and performance evaluation of the process. This study used two kinds of data, clinical and image data. The Genomic Data Commons (GDC) Data Portal and The Cancer Imaging Archive (TCIA) were used as the data source. The Random Forest regression estimation method was used to fill in the missing values. It first imputes all missing data with the mean/mode, then fits a random forest on the observed part and predicts the missing part for each variable with missing values. Three models are used to test the significance of variables on cancer survival rates: Kaplan Meier (KM), Cox Proportional Hazards (CPH), and Accelerated Failure Time (AFT). The analysis took into account three types of data: clinical only, image only, and combined clinical and image data. All three models have been effectively applied and the outcome revealed the most robust data and the crucial variable to be focused upon for further experimentation. For classification, a Convolutional Neural Network (CNN), with low computational cost and time overhead is used. The output of statistical models demonstrates the robustness of image data among all types, as it has the fewest chances of producing false results. Image data, which is common in clinical data collection is less prone to human error. As a result of the data's robustness, only image features data was preferred over clinical data and combined in the next step to perform the classification of images for cancer prediction. Based on the accuracy, the CNN results were compared to the two other ensemble approaches, Random forest (RF) and XgBoost. CNN achieved an accuracy of 99% in image classification, which was higher than the accuracy rates of Random forest (RF) and XgBoost, which were 95.83% and 95.83%, respectively. As a result, the CNN model can be applied to new Computerized Tomography (CT) scan images for lung cancer diagnosis to conduct additional research and to assist clinicians.
dc.identifier.urihttps://laurentian.scholaris.ca/handle/10219/4246
dc.language.isoen_CA
dc.publisherLaurentian University Library & Archives
dc.rights.holderKiran Chhetri
dc.rights.licenseLaurentian University ETD license
dc.subjectLung Cancer Survival Analysis, Random Forest, XgBoost, Convolutional Neural Network, Genomic Data, The Cancer Imaging Archive, Kaplan Meier, Cox-Proportional hazards model, Accelerated Failure Time Model
dc.titleSurvival analysis and prediction of lung cancer in patients based on clinical and image features using machine learning
dc.typeThesis
thesis.degree.disciplineComputational Science
thesis.degree.grantorLaurentian University (en_CA)
thesis.degree.level1
thesis.degree.nameMaster of Science (MSc) in Computational Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis FINAL_Kiran Chhetri_07-Feb-2024.pdf
Size:
2.05 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.92 KB
Format:
Item-specific license agreed upon to submission
Description: