Using Topological Data Analysis to find discrimination between microbial states in human microbiome data

Mehrdad Yazdani, Larry Smarr and Rob Knight. (2016).

Abstract

The vast collection of microbial cells, referred to as the human microbiome, forms an ecology of diverse microbial organisms that lives with us in symbiosis. Since the human microbiome ecology differs dramatically in different body sites and individuals, understanding how and what changes in the ecology are of crucial importance. In this study we investigate Topological Data Analysis (TDA) as an unsupervised learning and data exploration tool to identify changes in microbial states. We compare TDA with other well-established methods, such as Principle Component Analysis (PCA) and Principle Coordinate Analysis (also known as Multidimensional Scaling or MDS), using a previously published dataset of high-resolution time series of the microbiome from 3 different sites (mouth, hands, and gut) from 2 healthy (one female, one male) subjects. Since previous studies have shown that microbial communities of healthy subjects are highly stable over time (unless disturbed by an external variable), we expect to identify 6 total microbial communities corresponding to the different body site and subject combinations in our dataset. We show that PCA and MDS reveal 3 distinct clusters that correspond to the three different body sites. However, these methods do not discriminate samples based on the subjects. We find here that TDA identifies distinct groups that discriminate between the female and male gut samples and also separate between the skin and tongue body sites as well. This suggests that TDA is able to identify groups of clusters that other methods may potentially miss.

LINK TO ARTICLE