I'm building a random forest model in Python with sklearn as a baseline to compare with predictions from an RNN built in keras (already completed predictions with the RNN...yay!). The data is time-series. Conceptually it's 623 segments each containing 180 sequential datapoints (padded to create equal length segments) with each segment having 7 feature channels and one target channel.
I've got the data prepped and split into training and test groups. The data currently is contained in a numpy.ndarray container with the shapes below.
X.shape: (623, 180, 7)
y.shape: (623, 180, 1)
X_train.shape.: (498, 180, 7)
y_train.shape: (498, 180, 1)
X_test.shape: (125, 180, 7)
y_test.shape: (125, 180, 1)
Since I'm doing a regression model I'm trying to use the randomforestregressor as below. However, I can't because it expects a 2D array.
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(X_train, y_train);
So I need to reshape or somehow restructure the data into a 2D format. Right now I'm thinking of iterating over the 3D array to create a 2D array. The 2D array would, for the features, be 7 columns (one per channel) with each row of each column containing the time-series data as a list or series.
Any other ideas out there on how to restructure this data? Any other advice is appreciated.
Thanks ahead of time.
For reference, I've looked over these links:
Reshaping 3D Numpy Array to a 2D array
numpy with python: convert 3d array to 2d
Sklearn Error, array with 4 dim. Estimator <=2
https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/