Faster markov blanket with tabu search for efficient feature selection of microarray cancer datasets
Date
2019-12-23
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In the field of medical science, particularly for diseases caused due to genetic reasons, the proper
classification of genes is necessary to prescribe a cure for the same. Genes are required to be classified
as per any particular characteristic that influences the cancer. Feature selection methods have been
recognized as being important in this domain. This process assumes more importance wherein
datasets containing a large number of variables (genes) are considered. In this thesis, we focus our
work on the Tabu search technique combined with the Markov Blanket algorithm for feature selection
and classification of microarray gene expression data. The HITON implementation of the Markov
Blanket algorithm for feature selection was implemented and compared with the HITON plus Tabu
search method on microarray gene expression datasets. We propose HITON with Tabu search for
feature selection algorithm to obtain high classification performance for high dimensional microarray
cancer datasets. Higher accuracy was achieved by applying Tabu search with HITON algorithm when
tested with three classifiers - KNN, SVM, and NN. The proposed algorithm HITON + Tabu with
SVM achieved 99.40% classification accuracy for the Prostate dataset, whereas HITON with SVM
gave 98.04% classification accuracy, an increase by 1.36% accuracy using HITON + Tabu algorithm.
For Leukemia dataset, HITON + Tabu with KNN achieved 99.83% classification accuracy, whereas
HITON with KNN gave 98.81% classification accuracy, an increase by 1.02% accuracy using
HITON + Tabu algorithm. For the Lung Cancer dataset, HITON + Tabu with SVM achieved 92.36%
classification accuracy, whereas HITON with SVM gives 89% classification accuracy, an increase
by 3.36% accuracy using HITON + Tabu algorithm. In addition, the proposed algorithm can be
generalized to solve various other optimization problems.
Description
Keywords
Feature Selection, Microarray Data, Markov Blankets, Wrapper Methods, HITON, Tabu Search, Fitness Function, Crossover, Mutation, Cancer Classification, Support Vector Machine, Neural Network, Gene Selection