PyCon X


2nd - 5th May 2019

High Performance and Scalability Made Easy For Data-Analytics/Machine-Learning Codes

Is your data-analysis code running for too long with Scikit-Learn, Numpy, Scipy and/or Pandas? Does your data-set blow your memory? Are spark/dask/… not giving you the scalability and/or ease-of-use your need?

Intel implemented optimizations in Numpy, Scikit-Learn, Scipy and Pandas which achieve up to orders of magnitude better performance for many functionalities compared to standard implementations. The optimized packages are drop-in replacements which do not require any code changes and allow processing more data in less time.

Moreover, two new tools allow you to easily bring your full data analytics pipeline to unprecedented scales: daal4py and HPAT. Daal4py is a convenient Python API to Intel® DAAL (Intel® Data Analytics Acceleration Library). While its interface is scikit-learn-like, its MPI-based engine under the hood allows scaling machine learning algorithms to bare-metal cluster performance with only little code changes. HPAT (High Performance Analytics Toolkit) scales analytics codes using Pandas/Python to bare-metal cluster performance. It automatically compiles a subset of Python (Pandas/Numpy/Daal4py) to efficient parallel binaries with MPI, also requiring only minimal code changes. With these tools your code can be orders of magnitude faster than alternatives like Apache Spark - without the pain of dealing directly with lower-level languages and/or tools like C and/or message passing.

in on Friday 3 May at 15:45 See schedule

Do you have some questions on this talk?

New comment