Assuming we have a Pandas dataframe and a scikit-learn model, trained (fit) using that dataframe. Is there a way to do row-wise prediction? The use case is to use the predict function to fill in empty values in the dataframe, using an sklearn model.
I expected that this would be possible using the pandas apply function (with axis=1), but I keep getting dimensionality errors.
Using Pandas version '0.22.0' and sklearn version '0.19.1'.
Simple example:
import pandas as pd
from sklearn.cluster import kmeans
data = [[x,y,x*y] for x in range(1,10) for y in range(10,15)]
df = pd.DataFrame(data,columns=['input1','input2','output'])
model = kmeans()
model.fit(df[['input1','input2']],df['output'])
df['predictions'] = df[['input1','input2']].apply(model.predict,axis=1)
The resulting dimensionality error:
ValueError: ('Expected 2D array, got 1D array instead:\narray=[ 1.
10.].\nReshape your data either using array.reshape(-1, 1) if your data has
a single feature or array.reshape(1, -1) if it contains a single sample.',
'occurred at index 0')
Running predict on the whole column works fine:
df['predictions'] = model.predict(df[['input1','input2']])
However, I want the flexibility to use this row-wise.
I've tried various approaches to reshape the data first, for example:
def reshape_predict(df):
return model.predict(np.reshape(df.values,(1,-1)))
df[['input1','input2']].apply(reshape_predict,axis=1)
Which just returns the input with no error, whereas I expect it to return a single column of output values (as an array).
SOLUTION:
Thanks to Yakym for providing a working solution! Trying a few variants based on his suggestion, the easiest solution was to simply wrap the row values in square brackets (I tried this previously, but without the 0 index for the prediction, with no luck).
df['predictions'] = df[['input1','input2']].apply(lambda x: model.predict([x])[0],axis=1)