1

There are a plethora of questions on SO about how to select rows in a DataFrame and replace values in a column in those rows, but one use case is missing. To use the example DataFrame from this question,

In [1]: df
Out[1]:
  apple banana cherry
0     0      3   good
1     1      4    bad
2     2      5   good

And this works if one wants to change a single column based on another:

df.loc[df.cherry == 'bad', 'apple'] = df.banana * 2

Or this sets the values in two columns:

df.loc[df.cherry == 'bad', ['apple', 'banana'] = np.nan

But this doesn't work:

df.loc[df.cherry == 'bad', ['apple', 'banana'] = [df.banana, df.apple]

, because apparently the right side is 3x2, while the left side is 1x2, hence the error message

ValueError: Must have equal len keys and value when setting with an ndarray

So I understand what the problem is, but what is the solution?

Community
  • 1
  • 1
David Nemeskey
  • 640
  • 1
  • 5
  • 16

1 Answers1

2

IIUC you can try:

df['a'] = df.apple * 3
df['b'] = df.banana * 2
print df
   apple  banana cherry  a   b
0      0       3   good  0   6
1      1       4    bad  3   8
2      2       5   good  6  10

df[['a', 'b']] = df.loc[df.cherry == 'bad', ['apple', 'banana']] 
print df
   apple  banana cherry    a    b
0      0       3   good  NaN  NaN
1      1       4    bad  1.0  4.0
2      2       5   good  NaN  NaN

Or use conditions with values:

df['a'] = df.apple * 3
df['b'] = df.banana * 2

df.loc[df.cherry == 'bad', ['apple', 'banana']] = 
df.loc[df.cherry == 'bad', ['a', 'b']].values
print df
   apple  banana cherry  a   b
0      0       3   good  0   6
1      3       8    bad  3   8
2      2       5   good  6  10

Another options with original columns:

print df[['apple','banana']].shift() * 2
   apple  banana
0    NaN     NaN
1   12.0     6.0
2    2.0     8.0

df.loc[df.cherry == 'bad', ['apple', 'banana']] = df[['apple','banana']].shift() * 2
print df
   apple  banana cherry
0    6.0     3.0   good
1   12.0     6.0    bad
2    2.0     5.0   good
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks! The second solution is the one I was looking for. So basically in my case, it would be `df.loc[df.cherry == 'bad', ['apple', 'banana']] = df.loc[df.cherry == 'bad', ['banana', 'apple']].values`. Interestingly enough, it doesn't work without the `values`, even though I inverted the column names. Anyway, just one more question: is there a way without adding new columns to `df` if I want something special (e.g. `df.banana * 2`)? Maybe with `apply`? – David Nemeskey Mar 22 '16 at 08:58
  • Thanks, that looks good -- so basically I could do sth like `df.loc[...] = pd.concat([df.banana * 3, df.apple * 2], axis=1).values`. Apparently, values was the magic bullet I was looking for, thanks! – David Nemeskey Mar 22 '16 at 09:49