4

I am using a supervised learning algorithm Random Forest classifier for training the data.

    clf = RandomForestClassifier(n_estimators=50, n_jobs=3, random_state=42)

Different parameter in the grid are:

    param_grid = { 
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [5,10],
    'min_samples_split': [5,10]
    }

Classifier "clf" and parameter grid "param_grid" are passed in the GridSearhCV method.

    clf_rfc = GridSearchCV(estimator=clf, param_grid=param_grid)

When I fit the features with labels using

    clf_rfc.fit(X_train, y_train)

I get the error "Too many indices in the array". Shape of X_train is (204,3) and of y_train is (204,1).

Tried with the option clf_rfc.fit(X_train.values, y_train.values) but could not get rid of the error.

Any suggestions would be appreciated !!

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
user3447653
  • 3,968
  • 12
  • 58
  • 100

3 Answers3

5

As mentioned in previous post the problems appears to be in y_train which dimensions are (204,1). I think this is the problem instead of (204,1) should be (204,), click here for more info.

So if you rewrite y_train everything should be fine:

c, r = y_train.shape
y_train = y_train.reshape(c,)

If it gives as error such as: AttributeError: 'DataFrame' object has no attribute 'reshape' then try:

c, r = y_train.shape
y_train = y_train.values.reshape(c,)
Rafael Valero
  • 2,736
  • 18
  • 28
1

The shape of the 'y-train' dataframe is not correct. Try this:

clf_rfc.fit(X_train, y_train[0].values)

OR

clf_rfc.fit(X_train, y_train.values.ravel())

1

y_train should be a 1-dimensional array

I have tried clf_rfc.fit(X_train, y_train.flatten()), and it did work!

Grimthorr
  • 6,856
  • 5
  • 41
  • 53
Simon J
  • 21
  • 2