How to Make your Machine Learning Model Accessible as an API
Make Machine Learning Models Accessible as an API

How to Make your Machine Learning Model Accessible as an API

Motivation

Finally! You have gathered all your data, cleaned and preprocessed it, fitted a model and the results on the test data fulfill the requirements… Now what?

Depending on your use case it might be sufficient to save the model on your local workstation and use it from time to time to generate a management report or something equally useful ;-). But let’s assume your model should be integrated in a larger system within your company. For example, your model produces sales forecasts for the next day/week/month. Why not integrate the results directly into the company wide KPI dashboard web application?

Of course, you could send the model to your colleague who is responsible for the dashboard app. And if you are lucky the dashboard app is also written in python and you can integrate the model with some reasonable effort. But what happens if the dashboard is not written in python? And what about new versions of your model? Is it necessary to send a new model to the dashboard people each time you have retrained it? And we have not yet discussed changing input parameters. It is clear that this approach cannot be the last word in wisdom.

One approach to this challenge is to wrap your model inside an API or more precise a REST-API. With a REST-API you can make clearly defined request to your model based on the technology of the World Wide Web. Usually a REST-API handles the communication between a client (your browser for example) and a server/database. If you open a website, you basically send a GET request to a server which processes your request and sends the html etc. back to your browser. In our case we have no database on the server side but a machine learning model which waits for your input data to produce some results which will then be returned to you.

There are several advantages to this approach. For example, as long as your interface definition stays the same, you can change the code or model in the background without any worries about the API consumers. Therefore, the API can be seen as contract between the consumer and the developer where you agree on the input and output format but not on the concrete implementation of the code or model behind it. Only if your model inputs change you have to adjust the interface. But if you have many consumers which depend on the old format you can simply use a versioning system to make your new model accessible under v2 while leaving v1 in place until every consumer has migrated to the new version. One of the most important advantages of this approach is that any consumer is free to use any kind of tool or programming language to make requests against your model since the underlying protocol (most likely http/https) is universal and every modern programming language has libraries to write http requests. So, let’s try it out.

FastAPI

To implement such an interface to your model we use a framework called FastAPI. It allows you to write REST-APIs in Python with minimal code and also provides nice features like an autogenerated documentation based on your code. It also integrates pydantic which allows you to clearly define and validate data types in python even if python is not statically typed. This might look a bit strange at the beginning, but it helps us to clearly define the contract with the consumers of our API. So, how does the most basic REST-API looks like?

from fastapi import FastAPI 

app = FastAPI() 

@app.get("/") 
async def root(): 
    return {"message": "Hello World"}

That’s really fast. We basically just import FastAPI, define an app Object from FastAPI, set the path to an endpoint where our app is listening to requests (think about a website like https://jeds-ai.com/endpoint) and implement a function which will be executed in case of a request. To execute such an application however is not as straight forward as running a simple python script. What we need is a server which “serves” our app. There are multiple options but here we use uvicorn which is an ASGI (Asynchronous Server Gateway Interface) server implementation. Explaining the details behind it is beyond the scope of this article but if you are interested you can read more about it here. To start our app we save the code from above inside a main.py file, open a terminal, navigate to the folder with the main.py in it and type:

uvicorn main:app --reload

The reload flag is for development purposes and reloads your app every time you save changes in the code. If everything works fine you should see something like this:

FastAPI served by uvicorn
FastAPI served by uvicorn

What happens here? You can think about your computer as a server now. The main.py application is running on your computer and waits for requests from clients to answer. So we won’t keep it waiting any longer. The IP address 127.0.0.1 represents your localhost (your machine which is not reachable from the network) and 8000 is the port the service is listening on. So, go to your browser and type http://localhost:8000/ and you see

FastAPI Hello World response in a browser
FastAPI Hello World response in a browser

The great news is, it does not become much more complicated than this for our basic setup. We can adjust the main.py with a few lines of code to load our model before stating the application and using it to make predictions on requests which are send to a /predict endpoint. But wait… what about the input? How can we provide the information to the model in this setting?

The underlying http protocol provides several solutions for this. One option is to use so called path or query parameters. Path parameters are part of the path. For example think of http://localhost:8000/12345 we can write our app such that the root endpoint “/” accepts the value behind it as input parameter. In this case 12345. In the function defined below we then can work with this value. Another option is to use query parameters. You can define multiple of these parameters like this http://localhost:8000/?a=0&b=10. This would provide the values a = 0 and b = 10 to the function. Both approaches are rather limited if you think about the complexity of the data structure you can build with it.

Therefore, in most machine learning applications which are wrapped inside an API you send the actual input data in the so-called body of your request. In the body you can define any kind of information in arbitrary length and you can define the structure of the data for example in json format. A typical body of a request could look like this:

{
    "a": 10, 
    "b": 10
} 

Until now we have only used get requests with our browser. To send data via a body, however, we have to use so called POST request. POST requests in the usual setting are used to send new objects to the server for example if you register to an online platform with your name and email. We use this behavior to send a formatted json object as input data to the server which then can be processed by the model. The adjusted code then looks like this.

import pickle

from fastapi import FastAPI
from pydantic import BaseModel

model_file = "/Path/to/mlflow/mlflowtest/mlruns/0/ede68b/artifacts/model/model.pkl"
reg = pickle.load(open(model_file, 'rb'))


class InputRequest(BaseModel):
    input: int

class OutputResponse(BaseModel):
    input: int
    output: float

app = FastAPI()


@app.get("/")
async def root():
    return {"message": "Hello World"}


@app.post("/predict", response_model=OutputResponse)
async def predict(input_request: InputRequest):
    response = OutputResponse(input=input_request.input, output=reg.predict([[input_request.input]]))
    return response

The model we use in this example is the one we have created with the help of MLflow in our previous post. It accepts just one parameter as input (X) and provides one output (y). In this case our model was trained to output the squared value of X. Using pydantic we then define two classes which describe our contract with the consumer of our API. As input we expect a simple object with only one value named input. We specify the type of this value as float. The output will be an object which contains the provided input value and the output value. Both are of type float. We now define a second endpoint named “/predict” which accepts POST requests. Our response will be an object of type “OutputResponse” and contains the unprocessed input and the prediction based on our model. That’s it. That’s the most basic setup to wrap a model inside a REST-API. To check if our setup works you only have to save the code and your API should reload automatically. Now you can open your browser again and use one of the cool features FastAPI provides out of the box. Simply go to http://localhost:8000/docs and you should see something like this:

FastAPI docs
FastAPI docs

This is an automatically generated documentation and testing tool based in the code and data structure you have proved based on the OpenAPI standard. You basically see all your endpoints and all your data types with examples. And if you select the predict endpoint you can also interact with the service by clicking on “Try it out”. Give it a try and see how the results appear in the form of your defined response data structure.

"Try it out" interaction with FastAPI docs
“Try it out” interaction with FastAPI docs

Congratulations! Your machine learning API is ready for now. Of course, this is only the start. In the next posts we will talk about deployment options like Docker for real use cases outside our local workstation and some useful tools for API development like Postman.

Leave a Reply