0

I have a dataframe:

train_df = pd.DataFrame({'home':['A','A','B','C','C'],'dest':['X','Y','Y','X','Y']})

If I do:

train_df[['home','dest']] = train_df[['home','dest']].astype('category')
    
from sklearn.preprocessing import OneHotEncoder
onehotenc = OneHotEncoder(handle_unknown='ignore')   
 
encoded_df = pd.DataFrame(onehotenc.fit_transform(train_df[['home','dest']]).toarray())
encoded_df.columns = onehotenc.get_feature_names_out()
train_df = train_df.join(encoded_df)

I do get the train_df dataframe with encoded_df columns added on the right. However, if I do

for df in [train_df]:
  df[['home','dest']] = df[['home','dest']].astype('category')

  from sklearn.preprocessing import OneHotEncoder
  onehotenc = OneHotEncoder(handle_unknown='ignore')

  encoded_df = pd.DataFrame(onehotenc.fit_transform(df[['home','dest']]).toarray())
  encoded_df.columns = onehotenc.get_feature_names_out()
  df = df.join(encoded_df)

the train_df is same as before. Why does the assignment not work in the for loop case? I need to do similar encoding on multiple dataframes, and add encoded columns to those dataframes. How can I do it in a for loop?

Tejas
  • 131
  • 6

1 Answers1

1

Python for ... in is actually assigning object in list to new variable, so if you modify the variable, it doesn't affect object in list.

You can either append the modified object to a new list or replace the object in list with it.

dfs = []

for df in [train_df]:
    ...
    dfs.append(df)

# or

dsf = [train_df]

for idx, df in enumerate(dfs):
    ...
    dfs[idx] = df
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • But I have been routinely doing modifications like df.drop('home',axis=1,inplace=True) or df['dest'][1] = 'P' within the for loop, and the changes reflect on the original variables (train_df). Is it only that I cannot do df = something? – Tejas Apr 13 '22 at 13:33
  • @Tejas Yes, see https://stackoverflow.com/a/25670170/10315163. – Ynjxsjmh Apr 13 '22 at 16:08