I have air pollution time series data that I need to make a forward period estimation. To do so, I used randomforest regressor from scikit-learn
to make prediction, and I want to visualize the prediction output but I have trouble visualizing the regression output where x-axis must show the right time index. Can suggest me how should I get better visualization for my below regression approach? Is there any better way to make this happen? Any idea?
my attempt
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
url = "https://gist.githubusercontent.com/jerry-shad/36912907ba8660e11cd27be0d3e30639/raw/424f0891dc46d96cd5f867f3d2697777ac984f68/pollution.csv"
df = pd.read_csv(url, parse_dates=['date'])
df.date = pd.DatetimeIndex(df.date)
# df.sort_values(by='date').reset_index(drop=True)
df.drop(columns=['Unnamed: 0'],axis=1,inplace=True)
resultsDict={}
predictionsDict={}
split_date ='2017-12-01'
df_training = df.loc[df.date <= split_date]
df_test = df.loc[df.date > split_date]
## exclude pollution_index columns from training and testing data
df_tr = df_training.drop(['pollution_index'],axis=1)
df_te = df_test.drop(['pollution_index'],axis=1)
## scaling features
scaler = StandardScaler()
scaler.fit(df_tr)
X_train = scaler.transform(df_tr)
y_train = df_training['pollution_index']
X_test = scaler.transform(df_te)
y_test = df_test['pollution_index']
X_train_df = pd.DataFrame(X_train,columns=df_tr.columns)
X_test_df = pd.DataFrame(X_test,columns=df_te.columns)
reg = RandomForestRegressor(max_depth=2, random_state=0)
reg.fit(X_train, y_train)
yhat = reg.predict(X_test)
resultsDict['Randomforest'] = evaluate(df_test['eyci'], yhat)
predictionsDict['Randomforest'] = yhat
## print out prediction from RandomForest
print(predictionsDict['Randomforest'])
plt.plot(df_test['pollution_index'].values , label='Original')
plt.plot(yhat,color='red',label='predicted')
plt.legend()
output of current attempt
here is the output of the above attempt:
In this attempt, I tried to make regression using randomforest regressor and intend to make simple plot but plot didn't show time on x-axis? Why? Does anyone know how to make this right? Any thoughts? Thanks
desired plot
Ideally, after trained the model, I want to make a forward period estimation, and this is the possible plot that I want to make from my above attempt:
Can anyone suggest to me the possible way of making the right visualization on regression output? Any thoughts?