I have completed training the scikit-learn
model and saved it as a pickle
file. Now I want to load the model and run the prediction but I don't know how to preprocess the input data.
dataset = {
'airline': ['SpiceJet', 'Indigo', 'Air_India']
}
df = pd.DataFrame.from_dict(dataset)
The airline
column has 3 airlines which will be used to create dummy columns with this code:
def preprocessing(df):
dummies = pd.get_dummies(df["airline"], drop_first=True)
return dummies
The dataset for training will have the schema like this:
| airline_SpiceJet | airline_Indigo | airline_Air_India |
My question is with the input below, how can I map the input to the corresponding column?
input = {
'airline': ['SpiceJet']
}
The expected output for the dataset:
| airline_SpiceJet | airline_Indigo | airline_Air_India |
| ---------------- | -------------- | ----------------- |
| 1 | 0 | 0 |