5

I have a pandas dataframe df with 4 columns. For example here is a toy example:

foo1    foo2     foo3  foo4
egg     cheese   2     1
apple   pear     1     3
french  spanish  10    1

The columns are foo1, foo2, foo3 and foo4

I would like to swap columns foo1 and foo2 and also swap columns foo3 and foo4 when foo3 < foo4. So the result would be:

foo1     foo2    foo3  foo4
cheese   egg     1     2
apple    pear    1     3
spanish  french  1     10

I can find the rows that need swapping with df[df['foo3'] < df['foo4']] but how can I do the swapping efficiently. My dataframe is large.

Kartik
  • 8,347
  • 39
  • 73
Simd
  • 19,447
  • 42
  • 136
  • 271
  • http://stackoverflow.com/questions/24036911/how-to-update-values-in-a-specific-row-in-a-python-pandas-dataframe – postelrich Sep 24 '16 at 03:07

2 Answers2

7

You can find the rows with df[df['foo3'] < df['foo4']], yes, but if you use the Boolean series instead, you can easily accomplish your goal:

s = df['foo3'] < df['foo4']
df.loc[s, ['foo1','foo2']] = df.loc[s, ['foo2','foo1']].values
df.loc[s, ['foo3','foo4']] = df.loc[s, ['foo4','foo3']].values

Note, you need the .values at the end of the RHS to prevent Pandas from aligning on column names, which will undermine the purpose.

Kartik
  • 8,347
  • 39
  • 73
  • Can I swap foo3 and foo4 at the same time? – Simd Sep 24 '16 at 07:22
  • That's possible, yes. Go python simultaneous assignment!!! Why don't you try it and let me know if you hit a bump..? – Kartik Sep 24 '16 at 07:24
  • I needed to de-Series-ify the row index `s` for this method, for example `s=list(df['foo3'] < df['foo4']`, otherwise I get "unalignable index" error – stevemo Jul 30 '19 at 18:30
3

You can use pandas.Series.where function to construct new data frame based on the condition:

pairs = [('foo1', 'foo2'), ('foo3', 'foo4')]  # construct pairs of columns that need to swapped

df_out = pd.DataFrame() 

# for each pair, swap the values if foo3 < foo4
for l, r in pairs:
    df_out[l] = df[l].where(df.foo3 < df.foo4, df[r])
    df_out[r] = df[r].where(df.foo3 < df.foo4, df[l])

df_out
#     foo1   foo2   foo3  foo4
#0  cheese    egg      1     2
#1   apple   pear      1     3
#2 spanish french      1    10
Psidom
  • 209,562
  • 33
  • 339
  • 356