Tu164 - Group Similarity Analysis (GSA) Is a Visualization Method to Evaluate Batch Effects and Sample Stratification in High-dimensional Mass Cytometry Data
Tuesday, June 20, 2023
6:00 PM – 7:45 PM
Joo Guan Yeo; Valerie Chew; Jing Yao Leong; Salvatore Albani
Abstract Text: High-dimensional mass cytometry or CyTOF (cytomtery by time-of-flight) measures over 50 proteins in single cells. We apply mass cytometry to profile immunome remodeling during healthy human development and disease. As data acquisition usually spreads over many experimental batches, it is essential to separate the contributions of biological and technical factors to measured immune variations. We created a two-step method, dubbed group similarity analysis (GSA), to visualize immune profiles obtained from unsupervised clustering. First, the multi-dimensional vectors representing the proportions of detected cell populations are projected into a two-dimensional embedding, such as UMAP, tSNE or PCA. Colorizing data points according to batch or biological properties reveals similarities between samples. Second, silhouette analysis of the embedding coordinates quantifies the similarity of samples to their respective groups. We illustrate how GSA helped us evaluate the benefits of batch normalization in two projects. In the EPIC reference immune atlas (Nature Biotechnology, 2020) that portrays the development of the healthy immune system from birth to old age, GSA demonstrated a significant improvement in sample stratification between different age groups. In an immune-oncology project, GSA showed the improved separation of hepatocellular carcinoma, adjacent non-tumor liver and peripheral blood. Other applications include the benchmarking of batch normalization and clustering methods and their parameters. For instance, we can show that increasing cluster numbers can aggravate batch effects. Furthermore, GSA can assist in the optimization of meta-clustering. In summary, GSA is a versatile tool in high-dimensional pattern discovery.