bad input shape when using labelEncoder

Question

I have a fairly large dataframe with both numerical and categorical values. I'm trying to encode the categorical values but am getting the above error.

Here's a simple version of the code:

from collections import defaultdict
d = defaultdict(LabelEncoder)
# Encoding the variable
fit = df[catgoricalValues].apply(lambda x: d[x.name].fit_transform(df[catgoricalValues]))

I'm using the approach described here, except instead of applying it on the entire dataframe, I specified the columns to encode.

I get this error:

ValueError: bad input shape (490546, 11)

Does this answer your question? [Label encoding across multiple columns in scikit-learn](https://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn) — Ben Reiniger, Aug 28 '20 at 00:29

Sy Ker · Answer 1 · 2020-08-27T19:21:50.847

0

Update

Seems like you are trying to apply the LabelEncoder to multiple columns; While you can apply the same LabelEncoder to all columns;

from sklearn.preprocessing import LabelEncoder

encoded = df[categoricalVal].apply(LabelEncoder().fit_transform)

It it better to use a new encoder for each columns. The link above should provide you with the solution.

edited Aug 27 '20 at 19:21

answered Aug 27 '20 at 19:04

Sy Ker

2,047
1
4
20

I received the same error - ValueError: bad input shape (490546, 11) – Lostsoul Aug 27 '20 at 19:06
Thank you for the update. Is there a better way to do what I'm trying to do? I just have a list of categorical values and I want to encode them all(then reverse the encoding later). When I ran your code I got this error - TypeError: argument must be a string or number – Lostsoul Aug 27 '20 at 19:28
Okay, can you provide a small example of your dataset? – Sy Ker Aug 27 '20 at 21:12

bad input shape when using labelEncoder

1 Answers1