A hybrid support vector machine strategy for ranking SNPs in genome-wide association studies
Abstract:
In genome wide association studies we wish to rank SNPs such that true associated ones are
placed at higher positions than false ones. The support vector machine (SVM) provides a
discriminative alternative to the widely used chi-square statistic. We propose a hybrid
strategy that combines the chi-square statistic with the support vector machine and study its
performance on simulated data and the Wellcome Trust Case Control Consortium (WTCCC) studies.
We show that our strategy ranks causal SNPs in simulated data significantly higher than the
chi-square test and SVM alone. We also show that our strategy ranks previously replicated
SNPs and associated regions (where applicable) of type 1 diabetes, rheumatoid arthritis, and
Crohn's disease higher than the chi-square, SVM, SVM-RFE, and the HMM SNP rankings. In WTCCC
studies with low signal strength such as type 2 diabetes there is no advantage with our
method. Finally, we show that our strategy yields an economical set of SNPs that predict
disease risk more accurately than previously replicated SNPs and top ranked SNPs in the
chi-square and SVM ranking for type 1 diabetes and arthritis as measured by the area under
curve of the widely used composite odds ratio score.
U. Roshan, S. Chikkagoudar, Z. Wei, K. Wang, H. Hakonarson,
A hybrid support vector machine strategy for ranking SNPs in genome-wide association studies
Submitted