TB-Insight: Run SPOTCLUST

Run SPOTCLUST [Please cite]

A Tool to Cluster Spoligotype Data for Tuberculosis Evolution and Epidemiology Mathematical Models for Genotyping and Epidemiological Data on Infectious Diseases Project

Tools: Octal to Binary / Binary to Octal

Users are requested to cite "Inna Vitol, Jeffrey Driscoll, Barry Kreiswirth, Natalia Kurepina, Kristin P. Bennett, "Identifying Mycobacterium tuberculosis Complex Strain Families using Spoligotypes", Infection, Genetics and Evolution, Volume 6, Issue 6, November 2006, Pages 491-504." in publications that benefit from this tool.

SPOTCLUST represents a novel approach to advance global studies of Mycobacterium tuberculosis complex (MTC) genotyping data. SPOTCLUST uses mixture models to identify strain families of MTC based on their spacer oligonucleotide typing (spoligotyping) patterns. The algorithm incorporates biological information on spoligotype evolution, without attempting to derive the full phylogeny of MTC. We applied our algorithm to spoligotype patterns identified among strains isolated between 1996 and 2004, primarily from New York State tuberculosis patients. Two models were employed to identify strain families in the data: a 36-component model based on spoligotypes database SpolDB3, and a randomly initialized model containing 48 components. Our results both confirm previously defined families of MTC strains and suggest certain new families. Our approach can potentially provide a simple first-step tool for epidemiology of tuberculosis.

Read the full paper here