I have a dataframe:
train_df = pd.DataFrame({'home':['A','A','B','C','C'],'dest':['X','Y','Y','X','Y']})
If I do:
train_df[['home','dest']] = train_df[['home','dest']].astype('category')
from sklearn.preprocessing import OneHotEncoder
onehotenc = OneHotEncoder(handle_unknown='ignore')
encoded_df = pd.DataFrame(onehotenc.fit_transform(train_df[['home','dest']]).toarray())
encoded_df.columns = onehotenc.get_feature_names_out()
train_df = train_df.join(encoded_df)
I do get the train_df dataframe with encoded_df columns added on the right. However, if I do
for df in [train_df]:
df[['home','dest']] = df[['home','dest']].astype('category')
from sklearn.preprocessing import OneHotEncoder
onehotenc = OneHotEncoder(handle_unknown='ignore')
encoded_df = pd.DataFrame(onehotenc.fit_transform(df[['home','dest']]).toarray())
encoded_df.columns = onehotenc.get_feature_names_out()
df = df.join(encoded_df)
the train_df is same as before. Why does the assignment not work in the for loop case? I need to do similar encoding on multiple dataframes, and add encoded columns to those dataframes. How can I do it in a for loop?