sklearn Logistic Regression "ValueError: Found array with dim 3. Estimator expected <= 2."

Question

I attempt to solve this problem 6 in this notebook. The question is to train a simple model on this data using 50, 100, 1000 and 5000 training samples by using the LogisticRegression model from sklearn.linear_model.

lr = LogisticRegression()
lr.fit(train_dataset,train_labels)

This is the code i trying to do and it give me the error.

ValueError: Found array with dim 3. Estimator expected <= 2.

Any idea?

The github link in this question is now dead. – rosstripi Jul 19 '19 at 14:10 — rosstripi, Jul 19 '19 at 14:10

score 125 · Accepted Answer · edited Sep 17 '16 at 03:51

125

scikit-learn expects 2d num arrays for the training dataset for a fit function. The dataset you are passing in is a 3d array you need to reshape the array into a 2d.

nsamples, nx, ny = train_dataset.shape
d2_train_dataset = train_dataset.reshape((nsamples,nx*ny))

edited Sep 17 '16 at 03:51

gsamaras

71,951
46
188
305

answered Jan 24 '16 at 05:57

Kristian K.

1,437
1
10
9

32

Would you mind explaining how ndarray.reshape can magically transform 3D data to 2D without losing the information represented by the original vectors? – scipilot Aug 27 '17 at 02:01
8

First dimension is maintained and the other two dimensions are flattened (so 28x28 becomes 784). The fit algorithm will next consider the first 784 features part of sample number one and the next 784 features part of sample two and so on. – Andrés Marafioti Sep 07 '17 at 13:49
2

I have my X data separate from y labels. How can I flatten the X train dataset using this answer? Y labels are a 5k array. X_train is a 5k x 1024 x 1024 – BluePython Nov 11 '17 at 20:58
1

You are correct, but how did you get that information? The [documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit) clearly says `Parameters: X : {array-like, sparse matrix}, shape (n_samples, n_features)` – kyriakosSt Apr 26 '18 at 15:45
2

So if I reshaped my data this way, fitted it and predicted my y. The shape of my y is now (nsamples, 1). But what I need y to be is (nx,ny) because I need a label for each pixel. What should I do in this case? – Nathan Apr 30 '20 at 05:31
This is wrong answer. nrgb is missing as the 4th parameter. – Irfan Shaikh Aug 09 '23 at 21:59

Attila · Answer 2 · 2020-08-17T13:38:45.813

12

In LSTM, GRU, and TCN layers, the return_sequence in last layer before Dence Layer must set False . It is one of conditions that you encounter to this error message .

edited Aug 17 '20 at 13:38

answered Aug 16 '20 at 09:19

Attila

139
1
4

score 4 · Answer 3 · answered Jul 04 '21 at 11:28

If anyone is stumbling onto this question from using LSTM or any RNN for two or more time series, this might be a solution.

However, to those who want error between two different values predicted, if for example you're trying to predict two completely different time series, then you can do the following:

from sklearn import mean_squared_error 
# Any sklearn function that takes 2D data only
# 3D data
real = np.array([
    [
        [1,60],
        [2,70],
        [3,80]
    ],
    [
        [2,70],
        [3,80],
        [4,90]
    ]
]) 

pred = np.array([
    [
        [1.1,62.1],
        [2.1,72.1],
        [3.1,82.1]
    ],
    [
        [2.1,72.1],
        [3.1,82.1],
        [4.1,92.1]
    ]
])

# Error/Some Metric on Feature 1:
print(mean_squared_error(real[:,:,0], pred[:,:,0]) # 0.1000

# Error/Some Metric on Feature 2:
print(mean_squared_error(real[:,:,1], pred[:,:,1]) # 2.0000

Additional Info from the numpy indexing

score 3 · Answer 4 · answered Mar 15 '22 at 13:42

I had a similar Error by solving an image classification problem. We have a 3D matrix: the first dimension is the total number of images, can be replaced by "-1", the second dimension is the product of the height and the width of the picture, the third dimension is equal to three, since the RGB image has three channels (red, green blue). If we don't want to lose information about the color of the image, then we use x_train.reshape(-1, nxny3). If the color can be neglected and thereby reduce the size of the matrix: x_train.reshape(-1, nxny1)

score 2 · Answer 5 · answered Nov 25 '21 at 19:09

2

You probably have the last "lstm" layer in your model using "return_sequences=True". Change this to false to not return the output for further lstm models.

answered Nov 25 '21 at 19:09

Phillip Otey

19
2

sklearn Logistic Regression "ValueError: Found array with dim 3. Estimator expected <= 2."

5 Answers5

Linked