Company: MAI / NRU HSE / ex-Microsoft
Start of main content
Talk type: Talk
mPyPl: a functional way to organize data processing in Python
When preparing data for machine learning, it is often necessary to perform a series of transformations on the data to prepare a dataset for training.
We will talk about a small library developed by the Microsoft Commercial Software Engineering group, which allows you to describe data processing as a single pipeline with named data streams. With the help of such a library it's convenient to process data that are too big to fit into Pandas DataFrame, but too small to use Spark/Databricks.