0

I have a fairly large dataframe with both numerical and categorical values. I'm trying to encode the categorical values but am getting the above error.

Here's a simple version of the code:

from collections import defaultdict
d = defaultdict(LabelEncoder)
# Encoding the variable
fit = df[catgoricalValues].apply(lambda x: d[x.name].fit_transform(df[catgoricalValues]))

I'm using the approach described here, except instead of applying it on the entire dataframe, I specified the columns to encode.

I get this error:

ValueError: bad input shape (490546, 11)
Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • Does this answer your question? [Label encoding across multiple columns in scikit-learn](https://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn) – Ben Reiniger Aug 28 '20 at 00:29

1 Answers1

0

Update

Seems like you are trying to apply the LabelEncoder to multiple columns; While you can apply the same LabelEncoder to all columns;

from sklearn.preprocessing import LabelEncoder

encoded = df[categoricalVal].apply(LabelEncoder().fit_transform)

It it better to use a new encoder for each columns. The link above should provide you with the solution.

Sy Ker
  • 2,047
  • 1
  • 4
  • 20
  • I received the same error - ValueError: bad input shape (490546, 11) – Lostsoul Aug 27 '20 at 19:06
  • Thank you for the update. Is there a better way to do what I'm trying to do? I just have a list of categorical values and I want to encode them all(then reverse the encoding later). When I ran your code I got this error - TypeError: argument must be a string or number – Lostsoul Aug 27 '20 at 19:28
  • Okay, can you provide a small example of your dataset? – Sy Ker Aug 27 '20 at 21:12