Source material for Topological Data Analysis

To start off the feature articles at appliedtopology.org, I figured it might be worth while collecting good entry points to the field. One of the most common questions I get about persistent homology and topological data analysis is how to get started with our techniques and ideas.

Overview articles and books

First off in the list of entry points is the written word. There are survey articles, overview articles and books written about topological data analysis as a whole, as well as focusing on specific parts.

Topology and Data by Gunnar Carlsson. This survey article came soon after Ghrist’s survey, and covers persistent homology, as well as Mapper for topological simplification and modeling. It also comes with a good discussion of the underlying philosophy of the field.
Start here.

Barcodes: the persistent topology of data by Robert Ghrist. This is the first major survey article to come out, and covers persistent homology and some of its applications.

Topology for computing by Afra Zomorodian. This is the first book format exposition of persistent homology for applied and computational topology. It is a good and self-contained introduction to the field, if ever so slightly dated: in particular, it does not cover anything about zigzag persistence or multi-dimensional persistence.

Computational Topology: an introduction by Herbert Edelsbrunner and John Harer. This book covers the state of the art as of 2010 of computational topology, with some focus on persistent homology: one third of the book is devoted to persistence and its applications. Throughout, the book discusses the underlying theory, the most obvious algorithm, and the fastest known algorithm.

Software packages

So you understand what the underlying ideas of the field are. Next up, you’ll want to try them out on your own data. There are some ways you can go to do this, and they all have their specific strengths and weaknesses.

Plex, jPlex, javaPlex: this sequence of libraries were developed in the Stanford group, and with an explicit aim at always interoperating smoothly and easily with Matlab. Of the three, we currently recommend javaPlex unless this library does not cover your exact use case — in which case some methods may exist in jPlex. Plex is written in C++, and connects to Matlab through a MEX interface, while jPlex and javaPlex are both Java libraries.

Dionysus: this library, written and maintained by Dmitriy Morozov, provides a platform for developing and experimenting with computational topology algorithms in C++ or in Python. It interfaces with CGAL for low dimensional geometric constructions, and has example applications provided for persistent homology, cohomology, vineyards, alphashapes and numerous other common techniques.

Perseus: this package, developed by Vidit Nanda, provides a platform for computing persistent homology for cubical and simplicial complexes generated in a number of different ways. It specifically uses methods based on discrete morse theory for speeding up computations.

pHat: this package, created by Ulrich Bauer, Michael Kerber and Jan Reininghaus builds on results by the authors that speeds up persistence computation by specific tricks that use structures in a persistence boundary matrix. Currently only using Z/2-coefficients and not constructing the complex for you, it seems to be the fastest publicly available package.

CHomP: this software package came out of the CHomP research project, and consists of a rich collection of tools to work persistently or statically with cubical complex data. For homology on image or voxel collection data, CHomP forms the fastest and most complete analysis system available right now.

We warmly appreciate suggestions for more papers, software, or other resources if you have anything to add to this list.

Overview articles and books

Leave a Reply Cancel reply