0

I have a list of columns in my large dataframe that are catergorical and I'm trying to encode them because some of the algo's I'm using do not accept strings(knn for example).

Here's my code:

#encode categories
from sklearn.preprocessing import LabelEncoder
# LabelEncoder
le = LabelEncoder()
# dataImputed[catgoricalValues] = dataImputed[catgoricalValues].apply(le.fit_transform) #didn't work
dataImputed[catgoricalValues] = le.fit_transform(dataImputed[catgoricalValues].astype(str))

I got this error:

ValueError: y should be a 1d array, got an array of shape (490546, 11) instead.

What can I do to only encode those values in my catgoricalValues list while maintaining all other values in my dataframe?

Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • [see this answer](https://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn) – RakeshV Jul 20 '20 at 18:04
  • @RakeshV I tried that, see my commented out line in the code.Didn't work. Complained about mix of strings/floats so I used the command below to ensure everything was str. – Lostsoul Jul 20 '20 at 18:07

1 Answers1

1

Try this:

import pandas as pd

from sklearn.preprocessing import LabelEncoder

def MultiLabelEncoder(columnlist,dataframe):
    for i in columnlist:
        labelencoder_X=LabelEncoder()
        dataframe[i]=labelencoder_X.fit_transform(dataframe[i])

MultiLabelEncoder(catgoricalValuesColumnNameList,dataImputed)
RakeshV
  • 444
  • 3
  • 11