Motivation
If you work as software engineer you most likely use version control systems for your code like git, commit to different branches of your project for different features and apply CI/CD pipeline to bring your code more or less automatically to production. Tools like git were designed to keep track of changes in usually small text-based files and manage large development teams working on the same codebase. When it comes to data science the situation is slightly different. Of course, you can use git for your analysis code (and I highly recommend that!) but what happens when you are playing around with different parameter combinations of your machine learning model? Should you commit to different branches each time you change one parameter of your model? And how can you map the test results of the model to the branch you have created? Not ideal right? And if you find yourself ending up with tons of text files or excel spreadsheets “monitoring” results of different parameter sets or copying and pasting evaluation metrics to wiki pages, this article is for you.
But this is not the only challenge you may face during the data science pipeline. Even if documenting parameter changes in text files is fine for you what about the resulting models? Committing exported binary files to the code repo will sooner or later result in a complete mess since git is designed to keep track of changes in text-based files. Your binary, however, will be committed completely each time you change it. So, committing ten models with different parameter sets will inflate your repo by 10x the size of the model binary even if the change in the code file will be a few bites only. As data scientist we therefore need additional tools to manager our workflow in an organized and reproducible way. In this post I will show you one simple solution which will cover most of the challenges we’ve discussed so far: MLflow
Introduction to MLflow
The official website describes MLflow as follows: “MLflow is an open source platform to manage the ML lifecycle […]”
In concrete terms this means first and foremost that MLflow keeps track of your parameters, evolution metrics and resulting models in the form of “experiments” and provides you a clean web frontend to inspect all results and developments of your work. And it is open source under the Apache-2.0 license so everybody is free to use it for their own projects. The tool has a lot more to offer and is for example capable of packing your models to complete Docker containers or serve your models trained with different frameworks as a REST API but in this post, we will focus on the model tracking functionality. So, let’s get started and train our first machine learning model the DevOps way.
An Example Workflow
For demonstration purposes we will train a simple linear regression model on some artificially generated dataset.
Generate Example Data
import numpy as np import pandas as pd X = np.random.uniform(low=0.1, high=10, size=(500,)) y = X**2 + np.random.normal(0,2,500) df = pd.DataFrame(data={'X': X, 'y': y})
As you can see, we have only one variable as our input for the model and the output is just the squared input and a random noise. This represents a nonlinear relationship between X and y. In the next step we define our linear regression model. Therefore, we use a sklearn pipeline.
Select a Machine Learning Model
from sklearn.linear_model import LinearRegression from sklearn.pipeline import make_pipeline from sklearn.preprocessing import PolynomialFeatures degree = 2 reg = make_pipeline(PolynomialFeatures(degree = degree), LinearRegression())
The only parameter we can change in this setup is the number of polynomial features we use to train the model. A degree of one basically means that the model tries to explain y solely based in the values of X. A degree of two will add a second parameter to the linear setup which is basically a squared version of X. The model then tries to explain y by X and X^2 and so on. Based on our generated data set we already know the usually hidden dependencies between X and y. Therefore, we would expect a very poor result of a model which only tries to explain y with X in a linear way (degree of one). A degree of two, however, should fit our data perfectly except for the noise. Everything above two should not improve our model on an out of sample data set, on the contrary, we would expect a slightly worse result with an increased degree because our model tries to overfit the data or in other words tries to explain the random noise. To check this hypothesis we split our data into training and testing sets.
Prepare Your Data and Fit the Model
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(df[['X']], df['y'], test_size=0.2)
… and fit the model on the training data.
reg.fit(X_train, y_train)
To check the performance, we can use the mean squared error (MSE) on the test set (out of sample data which were not included during the training process).
Evaluate the Model
from sklearn.metrics import mean_squared_error print(mean_squared_error(y_test, reg.predict(X_test)))
Ok. Depending on your setup you should see a value above 40. Not a really good result… as expected. So let’s try to increase the degree parameter of our pipeline. And run the process again. Now the result should looks much better with an MSE around 4. But is there any value in increasing the degree a bit further? Lets try… Also as expected, you should observe a slightly increased MSE. But what about a degree of 10, maybe… No…

But wait, what was the best value again?
Even if this is a very simple setup the general idea should sound familiar if you work as a data scientist. Now let’s bring MLflow into play.
MLflow – The Local Setup
To use MLflow in the most basic form you only need to install the package MLflow either via pip or your preferred way to install python packages (how to setup python environments). Now open a new terminal and type:
mlflow ui
If everything works fine you can now open http://localhost:5000 in your browser and see the web frontend of MLflow.
You are on the experiments tab right now. On the left you see there is already a Default experiment folder selected. The experiment is empty, so no training run is currently recorded. But first let’s have a look at the key concepts of MLflow.
MLflow Key Concepts
- Experiment: Within an experiment you store everything regarding one analysis or model training setup. A setup basically means you have already selected one type of model and a fixed number of adjustable parameters.
- Run: A run is a single instance of an experiment if you will. You select the values of the model parameters, define the training, validation and test data und execute the training and evaluation code. Within a run you can keep track of three output categories:
- Parameters: The actual values you have specified for this single run.
- Metrics: Typically, the evaluation metrics as a basis for the assessment of your model.
- Artifacts: Everything you want to store during the process like figures, reports, or the final model.
Integration of MLflow into the Pipeline
The idea is to adjust your existing pipeline code with some minimal supplements which select an experiment, start a run, log parameters and metrics and finally save the model for reuse in the future. The adjusted code looks as follows:
import mlflow mlflow.set_experiment("Default") with mlflow.start_run(): # log degree parameter as key-value pair degree = 2 mlflow.log_param("degree", degree) # train the model reg = make_pipeline(PolynomialFeatures(degree = degree), LinearRegression()) reg.fit(X_train, y_train) # log mean squared error on test set mse = mean_squared_error(y_test, reg.predict(X_test)) mlflow.log_metric("mse", mse) # save the model as an artifact mlflow.sklearn.log_model(reg, "model")
We import mlflow, select an experiment and start a run within a context manager so we can be sure the run completes even if some error occurs during the process. During the run we log the specified parameters (in our case only the degree), test the quality of our model on an out of sample test set based on the MSE, log the result to mlflow and finally save the model. This fairly basic setup can of course be adjusted to much more complex setups, but it illustrates the simplicity of MLflow. If you have your MLflow instance still running on your localhost you can now execute the code and after a few seconds you should see the following changes in the UI.
MLflow UI
Our first run is complete! A click on the start time allows you to explore the run in more detail.
As you can see all the parameters and metrics are safely stored for this specific setup and your model in form of a pickel file is stores in a separate folder on your disk. But as seen before, the MSE is really bad. So let’s adjust the degree parameter to 2 and run the pipeline again. The results will show up in the experiments tab of MLflow.
Finally let’s compare the two runs by selecting them as shown in the picture and a click on “compare”.
MLflow provides a nice overview of the different parameters and results. You can also visualize the different results depending on the parameters. Of course, in our simple case this doesn’t look really impressive but imagine a large setup with hundreds of runs and different parameter specifications. With MLflow you see instantly the model with the best performance and you can select the artifacts for further downstream development.
Summary
In this article we’ve looked at MLflow as a tool to automate your analysis and machine learning workflow monitoring. It was just an overview of the tracking functionalities MLflow provides, and we only used it locally. In future post we will discuss some more advanced topics like a cloud deployment of MLflow for real team collaboration and the model serving capabilities.