20

I am storing the results from a sklearn regression model to the varibla prediction.

prediction = regressor.predict(data[['X']])
print(prediction)

The values of the prediction output looks like this

[ 266.77832991  201.06347505  446.00066136  499.76736079  295.15519906
  214.50514991  422.1043505   531.13126879  287.68760191  201.06347505
  402.68859792  478.85808879  286.19408248  192.10235848]

I am then trying to use the to_csv function to save the results to a local CSV file:

prediction.to_csv('C:/localpath/test.csv')

But the error I get back is:

AttributeError: 'numpy.ndarray' object has no attribute 'to_csv'

I am using Pandas/Numpy/SKlearn. Any idea on the basic fix?

ZJAY
  • 2,517
  • 9
  • 32
  • 51

4 Answers4

36

You can use pandas. As it's said, numpy arrays don't have a to_csv function.

import numpy as np
import pandas as pd
prediction = pd.DataFrame(predictions, columns=['predictions']).to_csv('prediction.csv')

add ".T" if you want either your values in line or column-like.

DavidK
  • 2,495
  • 3
  • 23
  • 38
  • 8
    If I want to merge with a unique identifier from `X_test` ("id" column, not the index), will the prediction results correctly match every row? as in: `output=pd.DataFrame(data={"id":X_test["id"],"Prediction":y_pred})` `output.to_csv(path_or_buf="..\\output\\results.csv",index=False,quoting=3,sep=';')` – mrbTT May 27 '18 at 15:03
  • If X_test has the same lenght as y_pred, the answer is yes. – DavidK Oct 08 '19 at 11:40
17

You can use the numpy.savetxt function:

numpy.savetxt('C:/localpath/test.csv',prediction, ,delimiter=',')

and to load a CSV file you can use numpy.genfromtxt function:

numpy.genfromtxt('C:/localpath/test.csv', delimiter=',')
Ali
  • 1,605
  • 1
  • 13
  • 19
  • I had reshape my data after loading i.e: "pred_train = np.genfromtxt('encoded1.csv', delimiter=" ").reshape(-1, 1)", isn't there a way to save and load the data without thinking about reshaping it? – Saber Feb 06 '19 at 21:49
5

It is a very detailed solution cases like those but you can use it even in production.

First Save the Model

joblib.dump(regressor, "regressor.sav")

Save columns in order

pd.DataFrame(X_train.columns).to_csv("feature_list.csv", index = None)

Save data types of train set

pd.DataFrame(X_train.dtypes).reset_index().to_csv("data_types.csv", index = None)

Using it again:

feature_list = pd.read_csv("feature_list.csv")
feature_list = pd.Index(list(feature_list["0"]))

add_cols = list(feature_list.difference(X_test.columns))

drop_cols = list(X_test.columns.difference(feature_list))

for col in add_cols:
    X_test[col] = np.nan

for col in drop_cols:
    X_test = X_test.drop(col, axis = 1)

# reorder columns
X_test = X_test[feature_list]

types = pd.read_csv("data_types.csv")
for i in range(len(types)):
    X_test[types.iloc[i,0]] = X_test[types.iloc[i,0]].astype(types.iloc[i,1])

Make Predictions

regressor = joblib.load("regressor.sav")
predictions = regressor.predict(X_test)

Save Prediction Results

res = pd.DataFrame(predictions)
res.index = X_test.index # its important for comparison
res.columns = ["prediction"]
res.to_csv("prediction_results.csv")

Enjoy end to end model/prediction saver code!

Ilker Kurtulus
  • 357
  • 3
  • 10
0
predictions=regressor.predict(send_to_model)
#print(predictions)
output=pd.DataFrame({"Survived":predictions})
output.to_csv('C:/Users/<username>/Downloads/predictions.csv',index=False)