1

I would like to build a dataframe that compares the predicted results of a regression model (y_hat) with the test data (y_test). I have two access methods for selecting the test data. Access method 1 works but Access method 2 doesn't when I try to build the comparison dataframe.

Access method 1:

X_data = df_scores[['Hours']]
y_data = df_scores['Scores']
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.20, random_state=0)
lm = LinearRegression()
lm.fit(X_train, y_train)
y_hat = lm.predict(X_test)

This dataframe works:

df_scores_comp = pd.DataFrame({'Actual':y_test, 'Predicted':y_hat})
df_scores_comp

Access method 2:

But if I want to use the following way to access X_data and y_data ...

X_data = df_scores.loc[:, ['Hours']]
y_data = df_scores.loc[:, ['Scores']]

I get the following error ...

If using all scalar values, you must pass an index

When using either access method, y_hat is an array and X_data is a dataframe. But y_data is a series using the first access method and a dataframe in the second access method. I thought the clue might be in there somewhere with lm.predict but I can't figure it out.

When I tried the solution offered here (Constructing pandas dataframes...) by wrapping the dictionary in a list, I don't get an error. But the result isn't right: the y_hat (predicted) elements are in the correct column, but are squeezed into one row. And the y_test (Actual) elements and the index values are mixed up in the wrong columns and are squeezed into one row as well. Something like this:

    Actual                          Predicted
0   Scores 5 20 2 27 19 69 16...    [[16.884144762398048], [33.73226077948985], [7...

It should look like this (which is does using the first access method):

Actual  Predicted
5   20  16.884145
2   27  33.732261
19  69  75.357018
16  30  26.794801
11  62  60.491033
jwburritt
  • 61
  • 8
  • https://stackoverflow.com/questions/17839973/constructing-pandas-dataframe-from-values-in-variables-gives-valueerror-if-usi the answer in this, is your answer. – venkata krishnan Jul 21 '20 at 01:38
  • Does this answer your question? [Constructing pandas DataFrame from values in variables gives "ValueError: If using all scalar values, you must pass an index"](https://stackoverflow.com/questions/17839973/constructing-pandas-dataframe-from-values-in-variables-gives-valueerror-if-usi) – venkata krishnan Jul 21 '20 at 01:39

0 Answers0