PyCon X


2nd - 5th May 2019

Looking at the oceans with computer vision in Python

The oceans are the largest ecosystem on Earth, an ecosystem which is shaped by micro-organisms. Microscopic plankton produce about half of the oxygen on the planet and, yet, much is still unknown about these organisms. As part of the Tara Oceans project, we have collected hundreds of thousands of images of microscopic plankton which were analysed by computer vision in order to measure biological parameters of interest and to classify the organisms taxonomically.

I will demonstrate the principles underlying the image analysis pipeline, which is based on numpy, mahotas, scikit-learn, and jug. The result is a system which can take advantage of HPC computer clusters, distributing the computation over thousands of nodes, in a robust fashion (easy recovery if nodes fail). It works in a completely reproducible fashion ensuring that the outputs are always up to date. This application will be used as an example for introducing concepts, techniques, and best practices that can be useful for any project with large scale datasets and access to compute clusters.

I will also take a look back at the history of the Python ecosystem and what made Python the current language of choice for so many machine learning and data analysis problems. Finally, I discuss the current limitations Python as an ecosystem for data analysis are and what the future may bring.

Do you have some questions on this talk?

New comment