Improved prediction of gene expression of epigenomics data of lung cancer using machine learning and deep learning models
Date
2020-02-26
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Epigenetics is the study of biological mechanisms that will switch genes on and off, its
alterations are deeply involved in the change of gene expression among various diseases
including cancers. Machine learning is frequently used in cancer diagnosis and
detection. In this research, four types of data are used towards the correct prediction of
lung cancer, including DNA Methylation data, Histone data, Human Genome data, and
RNA-Seq data. Four feature selection methods - ReliefF, Gain Ratio (GR), Principle
Component Analysis (PCA), Correlation-based feature selection (CFS) and seven
different classifiers - Random Forest (RF), Support Vector Machine (SVM) with
Gaussian Kernel function and Linear Kernel function, Logistic Regression (LR), Naive
Bayes (NB), Artificial Neural Network, and Convolutional Neural Network (CNN)
were implemented in this study. The processing of these data sets is done using custom
R-script. The tools that were used for feature selection and classification in the
presented work are Weka 3 and Python. With the help of machine learning and deep
learning methods, we were able to improve the accuracy and area under the curve (AUC)
of the lung cancer prediction from an earlier published work. It was observed that the
CNN model overperformed the other six classification methods.
Description
Keywords
Epigenomics, deep learning, histone modification, DNA methylation, RNA-sequencing, feature selection, classification