A new multilayer hierarchy model for classifying weighted data point: SNP genotype calls
Most of the current studies on the association of Single Nucleotide Polymorphism (SNP) with disease are statistically based on the results of SNP genotype. The accuracy of genotype calls on 2-D plots is the key element in making the data analysis successful in SNP microarray technology. Ambiguous or incorrect cluster classification will cause wrong genotype calls and mislead the associated statistical results. The SNP spots on the plot could be weighted data points because they could have the same coordinates with the same reaction results. Cluster classification and outlier detection are the essential processes for determining the genotype calls with the consideration of these weighted points. Most of the current clustering algorithms are based on rules or statistical methods without taking into the consideration of weights. They have limitations or are only suitable for certain types of data, which might not work for classifying weighted data. In addition, the new microarray systems produce huge volume of data points through the interactions of millions of SNPs with hundreds of samples in a short period. Therefore, the question of how to automatically and accurately classify SNP clusters and to fail the outliers in high throughput mode is the ultimate goal. In this paper, a new hierarchical model with two phases is presented to robustly classify unsupervised SNP data into clusters and to fail the outliers successfully.
ACM International Conference Proceeding Series
Huang, Ching Yu, "A new multilayer hierarchy model for classifying weighted data point: SNP genotype calls" (2016). Kean Publications. 1718.