3

I have a number of pandas.Dataframe objects and want to reorder the columns of all of them in a for loop, but it's not working. What I have is:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.rand(5, 5))
df2 = pd.DataFrame(np.random.rand(5, 5))

dfs = [ df1, df2 ]

Now, changing the name of the columns works:

for df in dfs:
    df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

df1.head()

prints (columns with letters instead of numbers):

          a         b         c         d         e
0  0.276383  0.655995  0.512101  0.793673  0.165763
1  0.841603  0.831268  0.776274  0.670846  0.847065
2  0.626632  0.448145  0.184613  0.763160  0.337947
3  0.502062  0.881765  0.154048  0.908834  0.669257
4  0.254717  0.538606  0.677790  0.088452  0.014447

However, changing the order of the columns is not working in the same way. The following loop:

for df in dfs:
    df = df[ [ 'e', 'd', 'c', 'b', 'a' ] ]

leaves the dataframes unchanged.

If I do it for each dataframe, outside the for loop, it works, though:

df1 = df1[ [ 'e', 'd', 'c', 'b', 'a' ] ]
df1.head()

prints the following:

          e         d         c         b         a
0  0.165763  0.793673  0.512101  0.655995  0.276383
1  0.847065  0.670846  0.776274  0.831268  0.841603
2  0.337947  0.763160  0.184613  0.448145  0.626632
3  0.669257  0.908834  0.154048  0.881765  0.502062
4  0.014447  0.088452  0.677790  0.538606  0.254717

Why can't I loop over the dataframes to change the column order?

How can I loop over the dataframes in the list to change the column order?


Working with python 3.5.3, pandas 0.23.3

Luis
  • 3,327
  • 6
  • 35
  • 62
  • I find it really interesting. So basically you want to iterate over `dfs` list, and then see the changes done in the loop when calling `df1`, not `dfs[0]`, am I right? I am very curious, why modifying in the first loop (i.e. changing the columns' names) works this way, but rearranging columns doesn't. – pmarcol Jun 05 '19 at 12:36
  • @pmarcol Yeah, want to keep using `df1` for other things later in the code. – Luis Jun 05 '19 at 13:12
  • See my answer then :) – pmarcol Jun 05 '19 at 13:13

2 Answers2

2

Use enumerate and remember to assign back into your list:

for i, df in enumerate(dfs):
    dfs[i] = df[['e', 'd', 'c', 'b', 'a']]
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
  • This doesn't seem to change the column order in the original dataframes either (`df1`, `df2`). – Luis Jun 05 '19 at 13:20
  • No, it wont change the original object `inplace`. Would likely need the inverse of your `dfs` assignment operation - `df1, df2 = dfs` after the `for` loop – Chris Adams Jun 05 '19 at 13:35
  • Indeed, that would do the trick. @ChrisA Would you mind adding that to your answer? I think it's worth it. – pmarcol Jun 05 '19 at 13:38
2

I've spent a while on it, it actually gave me a nice puzzle.
It works this way, because in your first loop you modify the existing objects, but in the second loop you actually create new objects and overwrite the old ones; by that the list dfs loses its references to df1 and df2. If you want the code to work in the way that after second loop you'd like to see the changes applied to df1 and df2, you can only use methods, that operate on the original dataframe and do not require overwriting.
I'm not convinced that my way is the optimal one, but that's what I mean:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.rand(5, 5))
df2 = pd.DataFrame(np.random.rand(5, 5))

dfs = [ df1, df2 ]

for df in dfs:
    df.columns = [ 'a', 'b', 'c', 'd', 'e' ]

for df in dfs:
    for c in ['e', 'd', 'c', 'b', 'a']:
        df.insert(df.shape[1],c+'_new',df[c])
    #df.drop(['e', 'd', 'c', 'b', 'a'], axis=1)
    for c in [ 'a', 'b', 'c', 'd', 'e' ]:
        del df[c]
    df.columns = ['e', 'd', 'c', 'b', 'a']

Then calling df1 prints:

           e           d           c           b           a
0   0.550885    0.879557    0.202626    0.218867    0.266057
1   0.344012    0.767083    0.139642    0.685141    0.559385
2   0.271689    0.247322    0.749676    0.903162    0.680389
3   0.643675    0.317681    0.217223    0.776192    0.665542
4   0.480441    0.981850    0.558303    0.780569    0.484447
pmarcol
  • 453
  • 2
  • 9
  • Your explanation is good, about modify and overwrite objects... It'd be good to find a nicer way to rearrange the columns, though... – Luis Jun 05 '19 at 13:21
  • Yes, I would also like to see less 'hackery' way, but I couldn't find a method that would not require overwriting the original object. – pmarcol Jun 05 '19 at 13:28
  • If you stumble upon some `reindex` or similar, rememeber to come back and edit your answer ;) – Luis Jun 05 '19 at 13:33