A novel gene selection algorithm for cancer identification based on random forest and particle swarm optimization
MetadataShow full item record
In order to achieve informative gene from thousands of candidate genes contributing to the symptom of cancer, two novel gene selection approaches for classification of multiclass microarray datasets are proposed. In the first, method we use k-means clustering to remove redundancy, and then apply Random Forest (RF) to rank each gene in every cluster to remove irrelevance. The top scored genes from each cluster is gathered and a new feature subset (filtered genes) is generated. At the last stage filtered genes is used as input to eight benchmark classification methods. In the second approach we develop a novel method utilizing Particle Swarm Optimization combined with BoostedC5.0 decision tree as the classifier. We apply filtered genes that achieved by first proposed method as input to PSO+BoostedC5.0 classifier and compare the performance of it with 8 classifiers. Experimental results show that by using clustering technique and RF ranking we can give a true pattern which select a smaller number of feature subset and obtain better classification accuracy. Also by applying this method on ten microarray datasets and using filtered genes as input for 9 classifiers we showed that proposed PSO+BoostedC5.0 simplifies features effectively and obtains higher classification accuracy compared to the other classification methods.