About TB-Lineage  [Please cite]




The goal of our project is to develop mathematical models for analyzing and merging information in heterogeneous genotyping and epidemiological databases, and use these models to develop tools for control, understanding, prevention, and treatment of infectious diseases. Tuberculosis (TB) represents a re-emerging serious health threat worldwide. TB infection continues to grow causing over 2 million deaths each year despite the fact that it is largely curable with proper treatment. Our efforts have been concentrated on developing mathematical models for spacer oligonucleotide typing (Spoligotyping) data and Mycobacterium Interspersed Repetitive Units Variable Number Tandem Repeat (MIRU-VNTR) typing. Spoligotyping exploits polymorphism in the direct repeat region of chromosome of the Mycobacterium tuberculosis complex (MTBC) bacteria. This method results in simple binary pattern for each TB patient and is widely used for MTBC strain discrimination.

The classification of MTBC strains into genetic groups is important to track transmission patterns and to develop a better understanding of pathologic specificities in TB. MTBC a genetic groups have been defined at different levels of granularity. TB-Lineage can be used to predict major lineages as defined by the United States Centers for Disease Control and Prevention (CDC). TB-Lineage can be used to predict SIVIT clades or sub-lineages as defined by the WHO Supranational TB Reference Laboratory, Tuberculosis and Mycobacteria Unit, Institut Pasteur de la Guadeloupe (IP).

TB-Lineage can predict the following 7 major MTBC genetic lineages as defined by CDC using two different methods: RULES and CBN:
  • Modern Lineages
    • East-Asian (Beijing)
    • Euro-American
    • East-African Indian
  • Ancestral Lineages
    • Indo-Oceanic
    • M. africanum
      • West African 1
      • West African 2
    • M. bovis

Major lineages are predicted by two different methods. In the RULES method a set of rules was developed to classify isolates into these strain groups, based on the presence / absence of spacer sequences in the spoligotype pattern and the values of the MIRU loci. TB-Lineage rules requires spoligotypes and can also use MIRU. In the CBN method, major lineages are predicted using a conformal Bayesian network. CBN can predict major lineages using spoligotype alone, MIRU alone, or both. Details on the RULES method can be found in this paper:

"A Shabbeer, LS Cowan, C Ozcaglar, N Rastogi, SL Vandenberg, B Yener, KP Bennett, TB-Lineage: An online tool for classification and analysis of strains of Mycobacterium tuberculosis complex Infection, Genetics and Evolution 12 (4), 789-797, 2012.

Details on CBN can be found in "A Shabbeer, LS Cowan, C Ozcaglar, N Rastogi, SL Vandenberg, B Yener, KP Bennett, TB-Lineage: An online tool for classification and analysis of strains of Mycobacterium tuberculosis complex Infection, Genetics and Evolution 12 (4), 789-797, 2012.

TB-Lineage can predict the following 69 MTBC clades as defined by SITVIT and Institut Pasteur using the KBBN Method.

Further insights into genetic groups of MTBC can be provided by looking into its SITVIT sublineage/clade as defined by IP. We designed a knowledge-based Bayesian network (KBBN) which treats sets of expert rules as prior distribution on classes. KBBN uses data to refine rule-based classifiers when the rule set is incomplete or ambiguous. KBBN is a predictive model for 69 MTBC clades found in the SITVIT international collection. A table of the 69 SITVIT clades predicted by KBBN can be found here.

For more information consult the following paper: M. Aminian, D. Couvin, A. Shabbeer, K. Hadley, S. Vandenberg, N. Rastogi, K. P. Bennett, Predicting Mycobacterium tuberculosis Complex Clades Using Knowledge-Based Bayesian Networks, Biomedical Research International, to appear, 2013.