Can we use Pydantic models (BaseModel) directly inside model.predict() using FastAPI, and if not ,why?

Question

I'm using Pydantic model (Basemodel) with FastAPI and converting the input into a dictionary, and then converting it into a Pandas DataFrame to pass it into model.predict() function for Machine Learning predictions, as shown below:

from fastapi import FastAPI
import uvicorn
from pydantic import BaseModel
import pandas as pd
from typing import List

class Inputs(BaseModel):
    f1: float,
    f2: float,
    f3: str

@app.post('/predict')
def predict(features: List[Inputs]):
    output = []

    # loop the list of input features
    for data in features:
         result = {}

         # Convert data into dict() and then into a DataFrame
            data = data.dict()
            df = pd.DataFrame([data])

         # get predictions
            prediction = classifier.predict(df)[0]

         # get probability
            probability = classifier.predict_proba(df).max()

         # assign to dictionary 
            result["prediction"] = prediction
            result["probability"] = probability

         # append dictionary to list (many outputs)
            output.append(result)

    return output

It works fine, I'm just not quite sure if it's optimized or the right way to do it, since I convert the input two times to get the predictions. Also, I'm not sure if it is going to work fast in the case of having a huge number of inputs. Any improvements on this? If there's a way (even other than using Pydantic models), where I can work directly and avoid going through conversions and the loop.

Chris · Accepted Answer · 2023-02-13T06:33:39.063

0

First, you should use more descriptive names for your variables/objects. For example:

@app.post('/predict')
def predict(inputs: List[Inputs]):
    for input in inputs:
    # ...

You cannot pass the Pydantic model directly to the predict() function, as it accepts a data array, not a Pydantic model. Available options are listed below.

Option 1

You could use:

prediction = model.predict([[input.f1, input.f2, input.f3]])[0]

Option 2

If you don't wish to use a Pandas DataFrame, as shown in your question, i.e.,

df = pd.DataFrame([input.dict()])
prediction = model.predict(df)[0]

you could instead use the __dict__ method to get the values of all attributes in the model and convert it to a list:

prediction = model.predict([list(input.__dict__.values())])[0]

or, preferably, use the Pydantic's .dict() method:

prediction = model.predict([list(input.dict().values())])[0]

Option 3

You could avoid looping over individual items and calling the predict() function multiple times, by using, instead, the below:

import pandas as pd

df = pd.DataFrame([i.dict() for i in inputs])
prediction = model.predict(df)
probability = model.predict_proba(df)
return {'prediction': prediction.tolist(), 'probability': probability.tolist()}

or (in case you don't wish using Pandas DataFrame):

inputs_list = [list(i.dict().values()) for i in inputs]
prediction = model.predict(inputs_list)
probability = model.predict_proba(inputs_list)
return {'prediction': prediction.tolist(), 'probability': probability.tolist()}

edited Feb 13 '23 at 06:33

answered Apr 12 '22 at 22:37

Chris

18,724
6
46
80

I'm using a list of inputs I don't know how to use this approach in it, recheck my post I updated it with my case (list of inputs and list of outputs) – Legna Apr 12 '22 at 22:54
Thank you, what if I have many features because in reality I got 40 feature so it'll be exhausting to write them all – Legna Apr 12 '22 at 23:02
it won't be a waste of memory right? because for reach item I create a dataframe, imagine having 500k items? – Legna Apr 12 '22 at 23:05
got an error of "AttributeError: 'list' object has no attribute '__dict__'" – Legna Apr 13 '22 at 00:07
it works! I was hoping I can get rid of the loop and assign data directly as we normally do with mode.predict – Legna Apr 13 '22 at 00:52
It’s the same as the previous one, just instead of using a dataframe here there is a list, and dataframes are faster – Legna Apr 13 '22 at 01:17
`__dict__` is not a method, it is the actual namespace of the class, since the OP is using a pydantic model, they should almost certainly use `data.dict()` – juanpa.arrivillaga Apr 13 '22 at 01:50

Can we use Pydantic models (BaseModel) directly inside model.predict() using FastAPI, and if not ,why?

1 Answers1

Option 1

Option 2

Option 3

Linked

Related