Below is a piece of my code for Label Encoding. While implementing Label Encoder for one column of the Dataframe at a time, it worked fine. But, when I tried to implement on whole categorical features at once sklearn
throws
ValueError: bad input shape (600000, 24). I'm not able to find any specific reason for that.
df = pd.read_csv("../inputs/cat-in-the-dat-train-folds.csv")
# extracting the categorical features
cat_features = [x for x in df.columns if x not in ( "id", "target", "kflod")]
for col in cat_features:
df.loc[:, col] = df[col].astype(str).fillna("NONE")
df_train = df[df["kfold"] != fold].reset_index(drop=True)
df_valid = df[df["kfold"] == fold].reset_index(drop=True)
lbl_enc = preprocessing.LabelEncoder()
full_cat_data = pd.concat(
[df_train[cat_features], df_valid[cat_features]],
axis=0)
lbl_enc.fit(full_cat_data)
x_train = lbl_enc.transform(df_train[cat_features])
x_valid = lbl_enc.transform(df_valid[cat_features])