JSM2021: Geometric and Topological Information in Data Analysis

Geometric and Topological Information in Data Analysis

Time: Sunday, Aug, 8th, 2021, 3:30 PM – 5:20 PM EDT.

Session ID#: 220446 (https://ww2.amstat.org/meetings/jsm/2021/onlineprogram/ActivityDetails.cfm?SessionID=220446)

Session Sponsor: IMS

Session Type: Topic-Contributed

Sub Type: Papers

Place: Joint Statistical Meeting 2021 (Register required)(https://ww2.amstat.org/meetings/jsm/2021/ ).

Organizer(s): Hengrui Luo, Lawrence Berkeley National Laboratory

Chair(s): Chul Moon, Southern Methodist University


3:35 PM EDT 317318 Characterizing heterogeous information in persistent homology with applications to molecular structure modeling

Speaker. Zixuan Cang, University of California, Irvine

Abstract. Persistent homology is a powerful tool for characterizing the topology of a dataset at various geometric scales. However, in addition to geometric information, there can be a wide variety of nongeometric information, for example, there are element types and atomic charges in addition to the atomic coordinates in molecular structures. To characterize such datasets, we propose an enriched persistence barcode approach that retains the non-geometric information in the traditional persistence barcode. The enriched barcode is constructed by finding the smoothest representative cocycles determined by combinatorial Laplacian for each persistence pair. We show that when combined with machine learning methods, this enriched barcode approach achieves state-of-the-art performance in an important real-world problem, the prediction of protein-ligand binding affinity based on molecular structures.

3:55 PM EDT 317284 Gromov-Wasserstein learning in a Riemannian framework

Speaker. Samir Chowdhury, Stanford University

Abstract. Geometric and topological data analysis methods are increasingly being used to derive insights from data arising in the empirical sciences. We start with a use case where such techniques are applied to human neuroimaging data to obtain graphs which can then yield insights connecting neurobiology to human task performance. Reproducing such insights across populations requires statistical learning techniques such as averaging and PCA across graphs without known node correspondences. We formulate this problem using the Gromov-Wasserstein (GW) distance and present a recently-developed Riemannian framework for GW-averaging and tangent PCA. Beyond graph adjacency matrices, this framework permits consuming derived network representations such as distance or kernel matrices, and such choices lead to additional structure on the GW problem that can be exploited for theoretical and computational advantages. We show how replacing the adjacency matrix representation with a spectral representation leads to theoretical guarantees allowing efficient use of the Riemannian framework as well as state of the art accuracy and runtime in graph learning tasks such as matching and partitioning.

4:15 PM EDT 317312 Density estimation and modeling on symmetric spaces

Speaker. Didong Li, Princeton University

Abstract. In many applications, data and/or parameters are supported on non-Euclidean spaces. It is important to take into account the geometric structure of manifolds in statistical analysis to avoid misleading results. In this talk, we consider a very broad class of manifolds: non-compact Riemannian symmetric spaces. For this class, we provide statistical models on the tangent space, push these models forward onto the manifold, and easily calculate induced distributions by Jacobians. To illustrate the statistical utility of this theoretical result, we provide a general method to construct distributions on symmetric spaces, including the log-Gaussian distribution as an analogue of the multivariate Gaussian distribution in Euclidean space. With these new kernels on symmetric spaces, any existing density estimation approach designed for Euclidean spaces can be applied, and pushed forward to the manifold with an easy-to-calculate adjustment. We provide theorems showing that the induced density estimators on the manifold inherit the statistical optimality properties of the parent Euclidean density estimator; this holds for both frequentist and Bayesian nonparametric methods.

4:35 PM EDT 317251 Convergence of persistence diagram in the subcritical regime

Speaker. Takashi Owada, Purdue University

4:55 PM EDT 317225 Combining geometric and topological information for boundary estimation

Speaker. Justin Strait, University of Georgia

Abstract. We propose a method which jointly incorporates geometric and topological information to estimate object boundaries in images, through use of a topological clustering-based method to assist initialization of the Bayesian active contour model. Active contour methods combine pixel clustering, boundary smoothness, and prior shape information to estimate object boundaries. These methods are known to be extremely sensitive to algorithm initialization, relying on the user to provide a reasonable initial boundary. This task is difficult for images featuring objects with complex topological structures, such as holes or multiple connected components. Our proposed method provides an interpretable, smart initialization in these settings, freeing up the user from potential pitfalls. We provide a detailed simulation study, and then demonstrate our method on artificial image datasets from computer vision, as well as real-world applications to skin lesion and neural cellular images, for which multiple topological features can be identified.

5:00 PM EDT Discussion and Floor-time

This event is a subsequent event from last year’s https://appliedtopology.org/tda-at-jsm/