0

I am using Random Forest Regression on a power vs time data of an experiment that is performed for a certain time duration. Using that data, I want to predict the trend of power in future using time as an input. The code that has been implemented is mentioned below.

# Loading the excel dataset
df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/Cleaned total data.xlsx', header = None, names = [ "active_power", "current", "voltage"], usecols = "A:C",skiprows = [i for i in range(1)])
df = df.dropna()

The data set consists of approximately 30 hours of power vs time values as mentioned below. Data frame used for model Next a random Forest Regressor is fitted on training data. The R2 score achieved on test data is 0.87.

# Creating X and y 
X = np.array(series[['time_h']]).reshape(-1,1)
y = np.array(series['active_power'])


# Splitting dataset in training and testing
X_train2,X_test2,y_train2,y_test2 = train_test_split(X,y,test_size = 0.15, random_state = 1)


# Creating Random Forest model and fitting it on training data
forest = RandomForestRegressor(n_estimators=128, criterion='mse', random_state=1, n_jobs=-1)
forest_fit = forest.fit(X_train2, y_train2)

# Saving the model and checking the R2 score on test data 
filename = 'random_forest.sav'
joblib.dump(forest, filename)
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test2, y_test2)
print(result)

For future prediction, an array of time for 400 hours has been created to use as an input to the model as the power needs to be predicted for that duration.

# Creating a time array for future which will be used as input for future predictions
future_time2 = np.arange(len(series)*15)
future_time2 = future_time2*0.25/360
columns = ['time_hour']
dataframe = pd.DataFrame(data = future_time2, columns = columns)
future_times = dataframe[41006:].to_numpy()
future_times

Time array created for future predictionWhen the predictions are made in future, the model only provides output of a constant value over the entire duration of 400 hours. The output prediction is as below.

# Predicting power for future
future_pred = loaded_model.predict(future_times)
future_pred

Output of future_pred

Could someone please suggest me why the model is predicting same value for entire duration and how to modify the code so that I can get a trend of prediction with reasonable values and not a single value.

Thank you.

  • First of all, you should understand how Regressor Random Forest works. It makes sense to train it with only 1 feature? Probably not. I recommend you plot some trees of your Random Forest ([link](https://stackoverflow.com/questions/40155128/plot-trees-for-a-random-forest-in-python-with-scikit-learn)), and for sure you will understand the reason. – Alex Serra Marrugat Mar 24 '22 at 13:07
  • @AlexSerraMarrugat thank you for your suggestion. What algorithm would you like to suggest for our data which is having single feature? We want to predict for 400 hours in future but we have only 30 hours of past data. – user18551944 Mar 27 '22 at 09:29

0 Answers0