62

I have a dataframe df like this:

      A      B       C            D
1   blue    red    square        NaN
2  orange  yellow  circle        NaN
3  black   grey    circle        NaN

and I want to update column D when it meets 3 conditions. Ex:

df.ix[ np.logical_and(df.A=='blue', df.B=='red', df.C=='square'), ['D'] ] = 'succeed'

It works for the first two conditions, but it doesn't work for the third, thus:

df.ix[ np.logical_and(df.A=='blue', df.B=='red', df.C=='triangle'), ['D'] ] = 'succeed'

has exactly the same result:

      A      B       C            D
1   blue    red    square        succeed
2  orange  yellow  circle        NaN
3  black   grey    circle        NaN
cottontail
  • 10,268
  • 18
  • 50
  • 51
Eduardo Oliveira
  • 631
  • 1
  • 6
  • 4
  • Use the solution in [this answer](https://stackoverflow.com/a/73687100/19123103) if the other solutions are slow. – cottontail Nov 16 '22 at 20:06

6 Answers6

79

Using:

df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'

gives the warning:

/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

A better way of achieving this seems to be:

df.loc[(df['A'] == 'blue') & (df['B'] == 'red') & (df['C'] == 'square'),'D'] = 'M5'
Ena
  • 3,481
  • 36
  • 34
Praveen
  • 2,137
  • 1
  • 18
  • 21
28

You could try this instead:

df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'
Tim
  • 2,134
  • 3
  • 26
  • 40
6

You could try:

df['D'] = np.where((df.A=='blue') & (df.B=='red') & (df.C=='square'), 'succeed')

This answer might provide a detailed answer to the your question: Update row values where certain condition is met in pandas

Aryan Firouzian
  • 1,940
  • 5
  • 27
  • 41
theSanjeev
  • 149
  • 2
  • 10
4

This format might have been implied in the new answers, but the following bit actually worked for me.

df['D'].loc[(df['A'] == 'blue') & (df['B'] == 'red') & (df['C'] == 'square')] = 'succeed'

3

The third parameter of logical_and is to assign the array used to store the result.

Currently, the method @TimRich provided might be the best. In pandas 0.13 (in development), there's a new experimental query method. Try it!

waitingkuo
  • 89,478
  • 28
  • 112
  • 118
0

The existing solutions are very slow for large dataframes (100k+ rows); an alternative is to try numexpr evaluation with eval() method to build a boolean mask and use this mask to replace values using mask() method.

df['D'] = df['D'].mask(df.eval("A=='blue' and B=='red' and C=='square'"), 'succeed')

As the length of the dataframe increases eval() becomes much faster than the other alternatives. For example, for a frame with 1mil rows, it is 2.2 times faster than loc method outlined in Praveen, Tim and Alex Schwab's answers.

Yet another method is to use numpy.where() method to select values according to a condition.

df['D'] = np.where((df.A=='blue') & (df.B=='red') & (df.C=='square'), 'succeed', pd.NA)

This is similar to theSanjeev's answer; the only difference is to set the "else"-value that is missing in their answer. Speaking of missing values, pd.NA is also faster than np.nan or float('nan') as well.

cottontail
  • 10,268
  • 18
  • 50
  • 51