Pass axis=1
to call LabelEncoder().fit_transform
once for each row.
(By default, df.apply(func)
calls func
once for each column).
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame(data=[["France", "Italy", "Belgium"],
["Italy", "France", "Belgium"]], columns=["a", "b", "c"])
encoder = LabelEncoder()
df = df.apply(encoder.fit_transform, axis=1)
print(df)
yields
a b c
0 1 2 0
1 2 1 0
Alternatively, you could use make the data of category
dtype and use the category codes as labels:
import pandas as pd
df = pd.DataFrame(data=[["France", "Italy", "Belgium"],
["Italy", "France", "Belgium"]], columns=["a", "b", "c"])
stacked = df.stack().astype('category')
result = stacked.cat.codes.unstack()
print(result)
also yields
a b c
0 1 2 0
1 2 1 0
This should be significantly faster since it does not require calling encoder.fit_transform
once for each row (which might give terrible performance if you have lots of rows).