Random Forest Regression with Python SciKit Learn on list of time-series with multiple channels

Question

I'm building a random forest model in Python with sklearn as a baseline to compare with predictions from an RNN built in keras (already completed predictions with the RNN...yay!). The data is time-series. Conceptually it's 623 segments each containing 180 sequential datapoints (padded to create equal length segments) with each segment having 7 feature channels and one target channel.

I've got the data prepped and split into training and test groups. The data currently is contained in a numpy.ndarray container with the shapes below.

X.shape: (623, 180, 7)
y.shape: (623, 180, 1)
X_train.shape.: (498, 180, 7)
y_train.shape: (498, 180, 1)
X_test.shape: (125, 180, 7)
y_test.shape: (125, 180, 1)

Since I'm doing a regression model I'm trying to use the randomforestregressor as below. However, I can't because it expects a 2D array.

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(X_train, y_train);

So I need to reshape or somehow restructure the data into a 2D format. Right now I'm thinking of iterating over the 3D array to create a 2D array. The 2D array would, for the features, be 7 columns (one per channel) with each row of each column containing the time-series data as a list or series.

Any other ideas out there on how to restructure this data? Any other advice is appreciated.

Thanks ahead of time.

For reference, I've looked over these links:

Reshaping 3D Numpy Array to a 2D array

numpy with python: convert 3d array to 2d

Sklearn Error, array with 4 dim. Estimator <=2

https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/

I think you should merge the first two dimensions and then it will be a 2-d array of shape `(498*180, 7)`. Then you can send that to RF. — Vivek Kumar, Jun 08 '18 at 05:16

score 0 · Answer 1 · answered Jun 08 '18 at 05:01

0

Another option is create one dimensional arrays and then using np.c_[variable_1, variable_2, variable_n] concatenate the variables you want to consider in your training data:

my_regressor_forest.fit(np.c_[column1, column2], my_class_column)

answered Jun 08 '18 at 05:01

Francisco Cantero

64
1
8

score 0 · Answer 2 · answered Mar 09 '20 at 17:09

0

I would automatically do one of two things

Concatenate. (You would use 180*7 dimension array.)
Use dimensionality reduction (PCA gives you the dimension most explaining the variance in your data).

answered Mar 09 '20 at 17:09

Chris Young

1

score 0 · Answer 3 · edited Mar 28 '22 at 10:02

0

Try using np.reshape(x_train.shape[0],180*7)

The resultant shape will be [623,1260]

edited Mar 28 '22 at 10:02

ahuemmer

1,653
9
22
29

answered Mar 28 '22 at 07:00

Dcode

146
1
4

Random Forest Regression with Python SciKit Learn on list of time-series with multiple channels

3 Answers3