The prediction accuracies resulted from random forest regression models change each time I run the model

Question

Every time I run the RF model from the begining I got different accuracies I have run the following code:

df17_tmp1 = df17_tmp.sample(frac=6, replace = True).reset_index(drop=True)
    
x_3d = df17_tmp1[col_in_3d] # Features;  
y_3d = df17_tmp1['over/under_exc_vol(m3)'].values  # Target
   
# In[29]:
   
x_train_3d, x_test_3d, y_train_3d, y_test_3d = train_test_split(x_3d, y_3d, test_size = 0.3, random_state = 42)
   
# # train RF

# In[30]:

x_train_3d = x_train_3d.fillna(0).reset_index(drop = True)
x_test_3d = x_test_3d.fillna(0).reset_index(drop = True)

y_train_3d[np.isnan(y_train_3d)] = 0
y_test_3d[np.isnan(y_test_3d)] = 0

rf_3d = RandomForestRegressor(n_estimators = 70, random_state = 42)
rf_3d.fit(x_train_3d, y_train_3d)

# # Predict with RF and evaluate

# In[31]:

prediction_3d = rf_3d.predict(x_test_3d)
mse_3d = mean_squared_error(y_test_3d, prediction_3d)
rmse_3d = mse_3d**.5
abs_diff_3d = np.array(np.abs((y_test_3d - prediction_3d)/y_test_3d))
abs_diff_3d = abs_diff_3d[~np.isinf(abs_diff_3d)]

mape_3d = np.nanmean(abs_diff_3d)*100
accuracy_3d = 100 - mape_3d

I got the following results in terms accuracies:

85.94 / 85.71/ 85.83 / 82.64 / 86.56 / 85.24 / 83.40 / 82.39 / 84.98 / 83.81 /

So, is that normal? and which accuracy should be considered?

Yes, what do you think think the word "Random" in "RandomForest" means? — Dr. Snoopy, Aug 10 '23 at 11:01
@Dr.Snoopy truth is, they set an explicit `random_state`, both in `RandomForestRegressor()` and in `train_test_split()`. Seems the issue is in `df17_tmp.sample`, which seems to invoke the RNG without a seed being set. — desertnaut, Aug 10 '23 at 18:51

score 0 · Accepted Answer · edited Aug 10 '23 at 23:37

0

Although you set a random_state in your train_test_split() to generate a deterministic split and in the RandomForestRegressor() which would control the randomness within the algorithm, the difference is occurring due to the random sampling you are applying to your dataframe here:

df17_tmp1 = df17_tmp.sample(frac=6, replace = True).reset_index(drop=True)

You should replace the above line with the following:

df17_tmp1 = df17_tmp.sample(frac=6, replace = True, random_state = 42).reset_index(drop=True)

to get the same output on every run.

Please refer to the documentation and this thread to learn more.

edited Aug 10 '23 at 23:37

desertnaut

57,590
26
140
166

answered Aug 10 '23 at 19:55

Ro.oT

623
6
15

Thank you very much that really solved the issue @Raktim – Abboud Aug 11 '23 at 09:04
No worries! Happy learning! – Ro.oT Aug 11 '23 at 17:00

The prediction accuracies resulted from random forest regression models change each time I run the model

1 Answers1