Giotto-learn, an open-source library for topological machine learning in Python

L2F – Learn to Forecast, the Laboratory for Topology and Neuroscience (EPFL, Switzerland) and the Institute of Reconfigurable and Embedded Systems (heig-vd, Switzerland) are excited to announce Giotto, an open-source project aimed at integrating topological data analysis and machine learning at a fundamental level.

Giotto’s objective is to bring topological data analysis closer to the broader data science community, and to gather contributions from experts in the field.

Our first product is the Python library giotto-learn, released on 21 October 2019 under the Apache 2.0 license. We put an emphasis on making giotto-learn intuitive, user-friendly, and performant. It offers a convenient API and is fully compatible with the most used all-purpose machine learning library in the world, scikit-learn.

giotto-learn inherits the modularity and flexibility of the scikit-learn framework and extends the latter’s reach to include steps inspired by topological data analysis and by the theory of dynamical systems. The ability to create complex pipelines and to use scikit-learn’s model selection and hyperparameter searches, allows for topology-informed machine learning to be performed at larger scales and in the style used in modern data science. Our collaboration’s first paper shows how this allows for an extensive topological analysis of the MNIST digits dataset, including successful classification using topological features only!

While the API is written in Python, the package incorporates compiled C++ code for efficiency. In v0.1.0, Ripser (newly bound to our Python code) is used for fast computation of Vietoris-Rips persistence, and an optimised version of Hera is used for bottleneck and Wasserstein distances.

We look forward to your comments, suggestions and merge requests! Giotto’s core team is happy to help and can be reached at [email protected].

Leave a Reply Cancel reply