first tip toe into using sklearn with pandas so apologies if this may be a basic question. This is my code:
import pandas as pd
from sklearn.linear_model import LogisticRegression
X = df[predictors]
y = df['Plc']
X_train = X[:int(X.shape[0]*0.7)]
X_test = X[int(X.shape[0]*0.7):]
y_train = y[:int(X.shape[0]*0.7)]
y_test = y[int(X.shape[0]*0.7):]
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
result = model.score(X_test, y_test)
print("Accuracy: %.3f%%" % (result*100.0))
Now what I am hoping to do is get the predicted values back into the original df
so i can have a look at the difference between the actual df['Plc']
column and the predicted values for the y_test
.
I have tried this but feel its a) probably not the best way and b) the index numbers aren't lining up as expected.
y_pred = pd.DataFrame()
y_pred['preds'] = model.predict(X_test)
y_test = pd.DataFrame(y_test)
y_test['index1'] = y_test.index
y_test = y_test.reset_index()
y_test = pd.concat([y_test,y_pred],axis=1)
y_test.set_index('index1')
df = df.reset_index()
df_out = pd.merge(df,y_test,how = 'inner',left_index = True, right_index = True)
Any ideas on what I should be doing instead? Thanks!