1

I have two dataframes that look the same and for both of them I want to add an additional column and then reorder the columns. Here is a sample of what I tried to accomplish this:

data=[[1,2],[3,4]]
cols=['col1','col2']

df1=pd.DataFrame(data,columns=cols)
df2=pd.DataFrame(data,columns=cols)

for df in [df1,df2]:
    df.loc[:,'col3']=[5,6]
    df=df.reindex(['col3','col2','col1'],axis=1)

print(df1)

      col1  col2  col3
    0     1     2     5
    1     3     4     6

print(df2)

      col1  col2  col3
    0     1     2     5
    1     3     4     6

The third column was added as expected but the columns are still in the original order. I expected them to be col3, col2, col1. When I tried this later on the reindex worked as expected:

df1=df1.reindex(['col3','col2','col1'],axis=1)

I'm sure there is an explanation to why the column gets added but the reindex is ignored in my first attempt, but I have not been able to find one. Does anyone know why this happens?

joe76
  • 13
  • 3

2 Answers2

1

This is because df in your for loop is a local variable. When you do df.loc[:,'col3']=[5,6], you do a modification to the thing df references, which therefore affects df1. However, doing df.reindex(['col3','col2','col1'],axis=1) does not modify the original DataFrame but creates a new copy of it, which is then assigned to the local variable df inside the for loop. However, df1 and df2 remain unchanged. To see this, you can try printing df at the end of the for loop. It should print the desired value you want for df2 (with the reindexing)

  • Thanks, I get it now. Just out of curiosity, what would be a good way to rewrite this so the original dataframes are updated? This works, but is probably not the most efficient way to do it: `def change(df): df.loc[:,'col3']=[5,6] df=df.reindex(['col3','col2','col1'],axis=1,copy=False) return df df1=change(df1) df2=change(df2)` – joe76 Apr 04 '22 at 15:14
1

This is due to the way assignments work in Python. When you called df.loc[:,'col3']=[5,6], Python (correctly) interpreted this as "change loc[:,'col3'] of the object called df to [5,6]". On the other hand, when you called df = df.reindex(['col3', 'col2', 'col1'], axis=1), you were expecting this to be interpreted as "replace the object called df with the reindexed dataframe". What it actually did was redefine the label df so that it refers to the reindexed dataframe (without changing the object that df used to refer to).

Ben Grossmann
  • 4,387
  • 1
  • 12
  • 16
  • I tried this as well but this does not work either as the reindex will not change the original dataframes (df1 and df2). – joe76 Apr 04 '22 at 15:08
  • You're right, my suggestion doesn't work, so I've removed it. Apparently, [it's impossible to reindex a dataframe in place](https://stackoverflow.com/q/56462088/2476977). So, there is no way to implement this change such that the original variables are updated; you'll have to overwrite them. – Ben Grossmann Apr 04 '22 at 15:22