Random forest accuracy too low

Question

I want to predict an electric consumption by using randomforest . after making regulation on data , latest status is as below

X=df[['Temp(⁰C)','Araç Sayısı (adet)','Montaj V362_WH','Montaj V363_WH','Montaj_Temp','avg_humidity']]

X.head(15)

Output:

Temp(⁰C)    Araç Sayısı (adet)  Montaj V362_WH  Montaj V363_WH  Montaj_Temp avg_humidity
0   3.250000    0.0 0.0 0.0 17.500000   88.250000
1   3.500000    868.0   16.0    18.0    20.466667   82.316667
2   3.958333    774.0   18.0    18.0    21.166667   87.533333
3   6.541667    0.0 0.0 0.0 18.900000   83.916667
4   4.666667    785.0   16.0    18.0    20.416667   72.650000
5   2.458333    813.0   18.0    18.0    21.166667   73.983333
6   -0.458333   804.0   16.0    18.0    20.500000   72.150000
7   -1.041667   850.0   16.0    16.0    19.850000   76.433333
8   -0.375000   763.0   16.0    18.0    20.500000   76.583333
9   4.375000    1149.0  16.0    16.0    21.416667   84.300000
10  8.541667    0.0 0.0 0.0 21.916667   71.650000
11  6.625000    763.0   16.0    18.0    22.833333   73.733333
12  5.333333    783.0   16.0    16.0    22.166667   69.250000
13  4.708333    764.0   16.0    18.0    21.583333   66.800000
14  4.208333    813.0   16.0    16.0    20.750000   68.150000

y.head(15)

Output:

    Montaj_ET_kWh/day
0   11951.0
1   41821.0
2   42534.0
3   14537.0
4   41305.0
5   42295.0
6   44923.0
7   44279.0
8   45752.0
9   44432.0
10  25786.0
11  42203.0
12  40676.0
13  39980.0
14  39404.0

   X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=None)

   clf = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
   clf.fit(X_train, y_train['Montaj_ET_kWh/day'])
   for feature in zip(feature_list, clf.feature_importances_):
        print(feature)

OUTPUT

  ('Temp(⁰C)', 0.11598075020423881)
  ('Araç Sayısı (adet)', 0.7047301384616493)
  ('Montaj V362_WH', 0.04065706901940535)
  ('Montaj V363_WH', 0.023077554218712878)
  ('Montaj_Temp', 0.08082006262985514)
  ('avg_humidity', 0.03473442546613837)


 sfm = SelectFromModel(clf, threshold=0.10)
 sfm.fit(X_train, y_train['Montaj_ET_kWh/day'])

 for feature_list_index in sfm.get_support(indices=True):
      print(feature_list[feature_list_index])

OUTPUT:

  Temp(⁰C)
  Araç Sayısı (adet)

 X_important_train = sfm.transform(X_train)
 X_important_test = sfm.transform(X_test)

 clf_important = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
 clf_important.fit(X_important_train, y_train)
 y_test=y_test.values
 y_pred = clf.predict(X_test)
 y_test=y_test.reshape(-1,1)
 y_pred=y_pred.reshape(-1,1)
 y_test=y_test.ravel()
 y_pred=y_pred.ravel()
 label_encoder = LabelEncoder()
 y_pred = label_encoder.fit_transform(y_pred)
 y_test = label_encoder.fit_transform(y_test)

 accuracy_score(y_test, y_pred)

output :

 0.010964912280701754

I have no idea why accuracy was too low , any idea where I made mistake

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

Your mistake is that you are asking for accuracy (a classification metric) in a regression setting, which is meaningless.

From the accuracy_score documentation (emphasis added):

sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)

Accuracy classification score.

Check the list of metrics available in scikit-learn for suitable regression metrics (where you can also confirm that accuracy is used only in classification); for more details, see my answer in Accuracy Score ValueError: Can't Handle mix of binary and continuous target

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 31 '19 at 16:40

desertnaut

57,590
26
140
166

,firstly thanks for your valuable response , additionally I wrote script but the output of MSE and MAE were too high any idea ? MSE :1271612216.6578057 MAE :34036.22794627194 – tfirinci Mar 31 '19 at 17:25
@tfirinci 1) since the answer arguably resolved your reported issue, kindly accept it 2) unlike accuracy, which by definition lies in `[0, 1]`, there is in principle no way to tell beforehand that an MSE or MAE is "too high" (among other things, it critically depends on the *scale* of the outputs, too). But in any case, comments are not the suitable place for follow-up questions - if you feel so, please open a new one – desertnaut Mar 31 '19 at 17:31
@tfirinci Please **do not** change questions *after* they have been answered, especially if the change makes (valuable!) answers look irrelevant and wrong! This is not how SO works. Instead, as I already said, accept the ("valuable"!) answer, and **open a new question** (it is free)! Reverted the question to its previous version... – desertnaut Mar 31 '19 at 18:02
@ desertnaut , you are right that was my mistake sorry , I already voted your answer and I have write new one , link is below https://stackoverflow.com/questions/55444271/mse-and-mae-value-are-too-high-on-randomforest – tfirinci Mar 31 '19 at 18:52

Random forest accuracy too low

1 Answers1