Improved prediction of gene expression of epigenomics data of lung cancer using machine learning and deep learning models

Date
2020-02-26
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Epigenetics is the study of biological mechanisms that will switch genes on and off, its alterations are deeply involved in the change of gene expression among various diseases including cancers. Machine learning is frequently used in cancer diagnosis and detection. In this research, four types of data are used towards the correct prediction of lung cancer, including DNA Methylation data, Histone data, Human Genome data, and RNA-Seq data. Four feature selection methods - ReliefF, Gain Ratio (GR), Principle Component Analysis (PCA), Correlation-based feature selection (CFS) and seven different classifiers - Random Forest (RF), Support Vector Machine (SVM) with Gaussian Kernel function and Linear Kernel function, Logistic Regression (LR), Naive Bayes (NB), Artificial Neural Network, and Convolutional Neural Network (CNN) were implemented in this study. The processing of these data sets is done using custom R-script. The tools that were used for feature selection and classification in the presented work are Weka 3 and Python. With the help of machine learning and deep learning methods, we were able to improve the accuracy and area under the curve (AUC) of the lung cancer prediction from an earlier published work. It was observed that the CNN model overperformed the other six classification methods.
Description
Keywords
Epigenomics, deep learning, histone modification, DNA methylation, RNA-sequencing, feature selection, classification
Citation