I am using Random Forest Regression on a power vs time data of an experiment that is performed for a certain time duration. Using that data, I want to predict the trend of power in future using time as an input. The code that has been implemented is mentioned below.
# Loading the excel dataset
df = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/Cleaned total data.xlsx', header = None, names = [ "active_power", "current", "voltage"], usecols = "A:C",skiprows = [i for i in range(1)])
df = df.dropna()
The data set consists of approximately 30 hours of power vs time values as mentioned below.
Next a random Forest Regressor is fitted on training data. The R2 score achieved on test data is 0.87.
# Creating X and y
X = np.array(series[['time_h']]).reshape(-1,1)
y = np.array(series['active_power'])
# Splitting dataset in training and testing
X_train2,X_test2,y_train2,y_test2 = train_test_split(X,y,test_size = 0.15, random_state = 1)
# Creating Random Forest model and fitting it on training data
forest = RandomForestRegressor(n_estimators=128, criterion='mse', random_state=1, n_jobs=-1)
forest_fit = forest.fit(X_train2, y_train2)
# Saving the model and checking the R2 score on test data
filename = 'random_forest.sav'
joblib.dump(forest, filename)
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test2, y_test2)
print(result)
For future prediction, an array of time for 400 hours has been created to use as an input to the model as the power needs to be predicted for that duration.
# Creating a time array for future which will be used as input for future predictions
future_time2 = np.arange(len(series)*15)
future_time2 = future_time2*0.25/360
columns = ['time_hour']
dataframe = pd.DataFrame(data = future_time2, columns = columns)
future_times = dataframe[41006:].to_numpy()
future_times
When the predictions are made in future, the model only provides output of a constant value over the entire duration of 400 hours. The output prediction is as below.
# Predicting power for future
future_pred = loaded_model.predict(future_times)
future_pred
Could someone please suggest me why the model is predicting same value for entire duration and how to modify the code so that I can get a trend of prediction with reasonable values and not a single value.
Thank you.