0

I have a Random Forest model and I want to predict the score only for a single input.

Code to calculate the score :

x_small=X_valid.head(1)
y_small=y_valid.head(1)   
                     
Ypredict = Pickled_LR_Model.predict(x_small)  
print(Ypredict)

small_score = Pickled_LR_Model.score(x_small, y_small)  
print("Test score: {0:.2f} %".format(100 * small_score))  

The error I am getting :

UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.

Error is self-explanatory. Is there a way I can predict model score or any other relevant score for a single input. ? My current model will be hosted with a flask application where Input will be a single record only.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
K.Pil
  • 786
  • 2
  • 10
  • 24

2 Answers2

2

I am assuming you are working with a random forest regressor. If this is the case, you can calculate the difference between the predicted value and the real value (if the real value is available).

For example:

# Option 1 - observed error
print('Test score: ', round(Ypredict - y_small.item(), 2)
# Option 2 - observer error proportion
print('Test score: ', round((Ypredict - y_small.item()) / y_small.item(), 2))

There's a lot of other metrics you can use to measure the performance of a regressor. You can find some of them here (look at Regression in section 3.3.1.1. Common cases: predefined values).

Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76
  • Why `round`? And how is this essentially different from what has already been suggested in my answer? – desertnaut Mar 24 '21 at 15:21
  • We don't know the decimals in `y_small` or `Ypredict`, so might as well. As for your answer, you are suggesting MAE and MSE. I did not square nor convert the difference to absolute value. I also suggested the error as a proportion of the ground truth. By the way, how come you did not vote to close this question? I've seen you do that tons of times in questions such as this. – Arturo Sbr Mar 24 '21 at 15:55
  • You should have already seen that I have also suggested the difference, only after *justifying* its use as essentially the same thing with RMSE & MAE in the special case of single predictions and not as an ad hoc solution. Now, if you think that `round` and the proportion justified a separate answer (after repeating part of the existing one), so be it... – desertnaut Mar 24 '21 at 16:03
  • ... and you didn't like the edit, which went directly to the section of interest; I see. Regarding the "*questions such as this*" part, I would kindly suggest you read more closely. – desertnaut Mar 24 '21 at 16:04
  • Alright. I am still curious as to why you decided not to vote to close this question though. Not that I want it closed, obviously. It is just really similar to a question you closed a few weeks ago. – Arturo Sbr Mar 24 '21 at 16:09
  • Care to share the said question? I would bet that, under close inspection, it is not similar - here there is a specific *programming* issue, explicitly mentioned in the question (hosted flask app). I could have been wrong of course, but keeping the discussion hypothetical & speculative w/o concrete facts is not particularly meaningful or productive. Plus, there are always "grey zones". – desertnaut Mar 24 '21 at 16:12
  • Here's a link to [a recent example](https://stackoverflow.com/questions/66701122/scikit-learn-given-numerical-target-variable-should-i-transform-the-target-var/) . I suppose this is a grey area. Like I said, I was literally just curious as the current post is more of a conceptual doubt regarding regressor metrics, not programming. Anyway, I'm getting a little bored of posting comments. Good talking to you, and good edit! – Arturo Sbr Mar 24 '21 at 16:27
  • 1
    Indeed, this is a *conceptual* question ("*I would like to understand if [...]*"), without any real programming issue, plus a recommendation request (again, on methodology). – desertnaut Mar 24 '21 at 17:03
1

As the error says, R-squared is not well-defined for single predictions; in fact, scoring for single predictions does not make much sense in general, either.

Nevertheless, if you must do it for other (e.g. programming) reasons, you can use other performance metrics for regression, like RMSE or MAE (which, by definition, are equal for single predictions):

from sklearn.metrics import mean_squared_error, mean_absolute_error

# dummy data - must be single-element arrays, otherwise it throws error
y_true = [3]
y_pred = [2.5]

# RMSE:
mean_squared_error(y_true, y_pred, squared=False)
# 0.5

# MAE:
mean_absolute_error(y_true, y_pred)
# 0.5

FWIW, RMSE & MAE make much more sense as performance measures in such predictive settings than the R-squared; for details, see the last part of own answer in scikit-learn & statsmodels - which R-squared is correct?

Notice that these quantities should be presented as-is, and not as percentages (again, computing any percentage quantity for a single prediction does not make any sense); you may have already noticed that, in the special case of single predictions, they have a very natural interpretation, i.e. they are simply the difference between the prediction and the ground truth (here 0.5).

Having clarified that, you could of course make your code slightly more efficient, by simply taking the difference between the prediction and the ground truth:

import numpy as np
np.array(y_true) - np.array(y_pred) # won't work with simple Python lists
# array([0.5])

resting assured that what you actually compute is the RMSE/MAE, and not something ad hoc.

desertnaut
  • 57,590
  • 26
  • 140
  • 166