An AutoML Approach for Predicting Risk of Progression to Active Tuberculosis based on Its Association with Host Genetic Variations

Document Type

Conference Proceeding

Publication Date



Tuberculosis (TB) is a worldwide health challenge. Mycobacterium tuberculosis(M.tb) is capable of evading the host immune system which can lead to tuberculosis infection. Household contacts (HHCs) of TB cases have a higher risk of infection. Novel predictive techniques to identify high-risk TB susceptible groups are needed. Susceptibility to Tuberculosis is associated with host genetic variations. This research work uses the TPOT autoML tool to map genetic variations and TB infection status mathematically. Machine learning was employed to predict the risk of progression to active tuberculosis based on associated host genetic variation. Among the three adopted configurations, "TPOT Default", "TPOT spars", "TPOT N that were used,""TPOT Default,"and "TPOT sparse"produced the same best performance both reaching 0.816 Training CV score and 0.625 Testing Accuracy. Different genes variants identified using this approach were found to have distinctive contributions for TB infection, which represent the feature importance of the classifier. The feature importance of the random forest classifier pipeline in "TPOT sparse"was adopted. The top ten contributing genes were also submitted to Enrichr for gene pathway enrichment analysis. The identified enriched pathways have been shown to be key to TB infection.

Publication Title

ACM International Conference Proceeding Series

First Page Number


Last Page Number




This document is currently not available here.