Talk type: Talk

mPyPl: a functional way to organize data processing in Python

  • Talk in Russian

When preparing data for machine learning, it is often necessary to perform a series of transformations on the data to prepare a dataset for training.

We will talk about a small library developed by the Microsoft Commercial Software Engineering group, which allows you to describe data processing as a single pipeline with named data streams. With the help of such a library it's convenient to process data that are too big to fit into Pandas DataFrame, but too small to use Spark/Databricks.

  • #data pipelines
  • #functional programming
  • #pipe
  • #pipelines


