4

I am trying to convert a Pydantic model to a Pandas DataFrame, but I am getting various errors.

Here is the code:

from typing import Optional
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import sklearn
import pandas as pd
import numpy as np

class Userdata(BaseModel):
  current_res_month_dec: Optional[int] = 0
  current_res_month_nov:  Optional[int] = 0


async def return_recurrent_user_predictions_gb(user_data: Userdata):

      empty_dataframe =  pd.DataFrame([Userdata(**{
      'current_res_month_dec': user_data.current_res_month_dec,
      'current_res_month_nov': user_data.current_res_month_nov})], ignore_index=True)

This is the DataFrame that is returned when trying to execute it through /docs in my local environment:

Response body
Download
{
  "0": {
    "0": [
      "current_res_month_dec",
      0
    ]
  },
  "1": {
    "0": [
      "current_res_month_nov",
      0
    ]
  }

but if I try to use this DataFrame for a prediction:

model_has_afternoon = pickle.load(open('./models/model_gbclf_prob_current_product_has_afternoon.pickle', 'rb'))
result_afternoon = model_has_afternoon.predict_proba(empty_dataframe)[:, 1]

I get this error:

ValueError: setting an array element with a sequence.

I have tried building my own DataFrame before, and the predictions should work with a DataFrame.

Chris
  • 18,724
  • 6
  • 46
  • 80
Gotey
  • 449
  • 4
  • 15
  • 41

1 Answers1

4

You first need to convert the Pydantic model into a dictionary using Pydantic's dict() method. Note that other methods, such as Python's dict() function and .__dict__ attribute, have been found to be faster alternatives to Pydantic's dict() method (see this answer). However, since you are using a Pydantic model, it might be best to use Pydantic's dict() method, and then pass the dictionary to pandas.DataFrame() surrounded by square brackets; for example, pd.DataFrame([data.dict()]). As described in this answer, this approach can be used when you need the keys of the passed dict to be the columns and the values to be the rows. If you need to specify a different orientation, you can also use pandas.DataFrame.from_dict(). Afterwards, you can call model.predict(df) to get predictions, as demonstrated here and here.

Working Example

from typing import Optional
from fastapi import FastAPI
from pydantic import BaseModel
import pandas as pd

app = FastAPI()

class Userdata(BaseModel):
  col1: Optional[int] = 0
  col2:  Optional[int] = 0
  col3:  str = "foo"

@app.post('/submit')
def submit_data(data: Userdata):
    df = pd.DataFrame([data.dict()])
    # pred = model.predict(df)
    return "Success"

More Options

As you mentioned that you would like to use the DataFrame for Machine Learning predictions, it should be noted that there are a few other options to pass the data to predict() and predict_proba() functions that do not require to create a DataFrame. These options include:

model.predict([[data.col1, data.col2, data.col3]])

and

model.predict([list(data.dict().values())])

Please have a look at this answer for more details. In case you would also need to respond back to the client with a DataFrame in JSON format, please take a look here.

Chris
  • 18,724
  • 6
  • 46
  • 80