0

I'm trying to add some dummy variable columns to a list of databases in a for loop, however while the code works outside the loop as expected, it doesn't work when inside the loop. Here's a minimal working example- note when printing the df inside the loop it has the expected form, but then calling df1 and df2 outside the loop they don't have the additional columns.

import pandas as pd
df1=pd.DataFrame({'A':['a','b','c']})
df2=pd.DataFrame({'A':['b','c','b']})
combine=[df1,df2]
for df in combine:
    df=pd.concat([df,pd.get_dummies(df['A'])],axis=1)
    print(df)
print(df1)
df1=pd.concat([df1,pd.get_dummies(df1['A'])],axis=1)
print(df1)

   A  a  b  c
0  a  1  0  0
1  b  0  1  0
2  c  0  0  1

   A  b  c
0  b  1  0
1  c  0  1
2  b  1  0

   A
0  a
1  b
2  c

   A  a  b  c
0  a  1  0  0
1  b  0  1  0
2  c  0  0  1

Thanks for all your help.

  • 1
    Can you also post your desired outcome? – Cleb Feb 01 '18 at 20:24
  • 2
    Your issue is [this](https://stackoverflow.com/questions/14814771/do-python-for-loops-work-by-reference). Use a different variable name either in iteration or in assignment. – ayhan Feb 01 '18 at 20:27

1 Answers1

0

You experience no difference in df1 or df2 after the loop because within the loop you just change the reference df to point at brand new object, the result of

pd.concat([df,pd.get_dummies(df['A'])],axis=1)

The objects referenced by df1 or df2 are not getting changed themself.

Try to replace

df = pd.concat([df,pd.get_dummies(df['A'])],axis=1)

with

df[df['A'].unique()] = pd.get_dummies(df['A'])

and you will see the difference.

mr.tarsa
  • 6,386
  • 3
  • 25
  • 42