1

I'm building a random forest model in Python with sklearn as a baseline to compare with predictions from an RNN built in keras (already completed predictions with the RNN...yay!). The data is time-series. Conceptually it's 623 segments each containing 180 sequential datapoints (padded to create equal length segments) with each segment having 7 feature channels and one target channel.

I've got the data prepped and split into training and test groups. The data currently is contained in a numpy.ndarray container with the shapes below.

X.shape: (623, 180, 7)
y.shape: (623, 180, 1)
X_train.shape.: (498, 180, 7)
y_train.shape: (498, 180, 1)
X_test.shape: (125, 180, 7)
y_test.shape: (125, 180, 1)

Since I'm doing a regression model I'm trying to use the randomforestregressor as below. However, I can't because it expects a 2D array.

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(X_train, y_train);

So I need to reshape or somehow restructure the data into a 2D format. Right now I'm thinking of iterating over the 3D array to create a 2D array. The 2D array would, for the features, be 7 columns (one per channel) with each row of each column containing the time-series data as a list or series.

Any other ideas out there on how to restructure this data? Any other advice is appreciated.

Thanks ahead of time.

For reference, I've looked over these links:

Reshaping 3D Numpy Array to a 2D array

numpy with python: convert 3d array to 2d

Sklearn Error, array with 4 dim. Estimator <=2

https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/

  • I think you should merge the first two dimensions and then it will be a 2-d array of shape `(498*180, 7)`. Then you can send that to RF. – Vivek Kumar Jun 08 '18 at 05:16

3 Answers3

0

Another option is create one dimensional arrays and then using np.c_[variable_1, variable_2, variable_n] concatenate the variables you want to consider in your training data:

my_regressor_forest.fit(np.c_[column1, column2], my_class_column)
0

I would automatically do one of two things

  1. Concatenate. (You would use 180*7 dimension array.)
  2. Use dimensionality reduction (PCA gives you the dimension most explaining the variance in your data).
0

Try using np.reshape(x_train.shape[0],180*7)

The resultant shape will be [623,1260]

ahuemmer
  • 1,653
  • 9
  • 22
  • 29
Dcode
  • 146
  • 1
  • 4