Modifying multiple columns in a subset of rows in pandas DataFrame

Question

I've got a pandas DataFrame. In this DataFrame I want to modify several columns of some rows. These are the approaches I've attempted.

df[['finalA', 'finalB']] = df[['A', 'B']]
exceptions = df.loc[df.normal == False]

Which works like a charm, but now I want to set the exceptions:

df.loc[exceptions.index, ['finalA', 'finalB']] = \
  df.loc[exceptions.index, ['A_except', 'B_except']]

Which doesn't work. So I tried using .ix from this answer.

df.ix[exceptions.index, ['finalA', 'finalB']] = \
  df.ix[exceptions.index, ['A_except', 'B_except']]

Which doesn't work either. Both methods give me NaN in both finalA and finalB for the exceptional rows.

The only way that seems to work is doing it one column at a time:

df.ix[exceptions.index, 'finalA'] = \
  df.ix[exceptions.index, 'A_except']
df.ix[exceptions.index, 'finalB'] = \
  df.ix[exceptions.index, 'B_except']

What's going on here in pandas? How do I avoid setting the values to the copy that is apparently made by selecting multiple columns? Is there a way to avoid this kind of code repetition?

Some more musings: It doesn't actually set the values to a copy of the dataframe, it sets the values to NaN. It actually overwrites them to a new value.

Sample dataframe:

import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4],
                   'B': [5,6,7,8],
                   'normal': [True, True, False, False],
                   'A_except': [0,0,9,9],
                   'B_except': [0,0,10,10]})

Result:

    A   A_except    B   B_except    normal  finalA  finalB
0   1   0           5   0           True    1.0     5.0
1   2   0           6   0           True    2.0     6.0
2   3   9           7   10          False   NaN     NaN
3   4   9           8   10          False   NaN     NaN

Expected result:

    A   A_except    B   B_except    normal  finalA  finalB
0   1   0           5   0           True    1       5
1   2   0           6   0           True    2       6
2   3   9           7   10          False   9       10
3   4   9           8   10          False   9       10

score 4 · Accepted Answer · answered Mar 02 '17 at 11:24

You can rename column names for align:

d = {'A_except':'finalA', 'B_except':'finalB'}
df.loc[exceptions.index, ['finalA', 'finalB']] = \
  df.loc[exceptions.index, ['A_except', 'B_except']].rename(columns=d)

print (df)
   A  A_except  B  B_except normal  finalA  finalB
0  1         0  5         0   True       1       5
1  2         0  6         0   True       2       6
2  3         9  7        10  False       9      10
3  4         9  8        10  False       9      10

Another solution is convert output to numpy array, but columns dont align:

df.loc[exceptions.index, ['finalA', 'finalB']] = \
  df.loc[exceptions.index, ['A_except', 'B_except']].values

print (df)
   A  A_except  B  B_except normal  finalA  finalB
0  1         0  5         0   True       1       5
1  2         0  6         0   True       2       6
2  3         9  7        10  False       9      10
3  4         9  8        10  False       9      10

score 2 · Answer 2 · answered Mar 02 '17 at 11:25

If you view both sides of the equations, you will notice that the columns differ. Pandas takes the labels of the columns into account, and since they don't match, wont insert the value.

It works for a single column because then you are extracting a Series, and then the column label no longer applies.

A quick solution would be the simply strip the DataFrame to a bare array, then both the loc and ix method work:

df.loc[exceptions.index, ['finalA', 'finalB']] = 
  df.loc[exceptions.index, ['A_except', 'B_except']].values

But keep in mind that doing this will eliminate Pandas attempt to match column and index labels, its basically a 'hard' insert. So that makes you as a user responsible for the proper alignment. Which in this case is not a problem, but something to be aware of in general.

Modifying multiple columns in a subset of rows in pandas DataFrame

2 Answers2