How to apply a function to a column in Pandas depending on the value in another column?

Question

Thank you in advance for reading.

I have a dataframe:

df = pd.DataFrame({'Words':[{'Sec': ['level']},{'Sec': ['levels']},{'Sec': ['level']},{'Und': ['ba ']},{'Pro': ['conf'],'ProAbb': ['cth']}],'Conflict':[None,None,None,None,'Match Conflict']})


         Conflict                                     Words
0            None                      {u'Sec': [u'level']}
1            None                     {u'Sec': [u'levels']}
2            None                      {u'Sec': [u'level']}
3            None                        {u'Und': [u'ba ']}
4  Match Conflict  {u'ProAbb': [u'cth'], u'Pro': [u'conf']}

I want to apply a routine that, for each element in 'Words', checks if Conflict = 'Match Conflict' and if so, applies some function to the value in 'Words'.

For instance, using the following placeholder function:

def func(x):
    x = x.clear()
    return x

I write:

df['Words'] = df[df['Conflict'] == 'Match Conflict']['Words'].apply(lambda x: func(x))

My expected output is:

         Conflict                                     Words
0            None                      {u'Sec': [u'level']}
1            None                     {u'Sec': [u'levels']}
2            None                      {u'Sec': [u'level']}
3            None                        {u'Und': [u'ba ']}
4  Match Conflict                                        None

Instead I get:

         Conflict Words
0            None   NaN
1            None   NaN
2            None   NaN
3            None   NaN
4  Match Conflict  None

The function is applied only to the row which has Conflict = 'Match Conflict' but at the expense of the other rows (which all become None. I assumed the other rows would be left untouched; obviously this is not the case.

Can you explain how I might achieve my desired output without dropping all of the information in the Words column? I believe the answer may lie with np.where but I have not been able to make this work, this was the best I could come up with.

Any help much appreciated. Thanks.

`df['Words'] = #anything` overwrites the `words` column. So this is behaving exactly as you asked it to. — Paul H, Jan 31 '17 at 22:21
@PaulH Appreciate the feedback. I tried to apply what little I knew and this was as far as it got me. I am glad for your and Psidom 's assistance. — Chuck, Jan 31 '17 at 22:30

score 4 · Accepted Answer · answered Jan 31 '17 at 22:27

4

You can try to update only those rows that match the condition using .loc:

df.loc[df['Conflict'] == 'Match Conflict', 'Words'] = df.loc[df['Conflict'] == 'Match Conflict', 'Words'].apply(lambda x: func(x))

answered Jan 31 '17 at 22:27

Psidom

209,562
33
339
356

Thank you very much for your help. I can adapt this and @Paul H 's answer to do everything I need. Really appreciate it. – Chuck Jan 31 '17 at 22:32
Just wanted to come back and say thanks again. I implemented this in production today and it has resolved an entire afternoons worth of troubles. Thanks – Chuck Feb 01 '17 at 09:49

score 3 · Answer 2 · answered Jan 31 '17 at 22:26

3

You should rewrite the function to work with all of your rows:

def func(x, match):
    if x['Conflict'] == match:
        return None
    return x['Words']

df['Words'] = df.apply(lambda row: func(row, 'Match Conflict'), axis=1)

answered Jan 31 '17 at 22:26

Paul H

65,268
20
159
136

Thank you very much for your help Paul :) This is extremely useful. I was pulling my hair out all day. – Chuck Jan 31 '17 at 22:32

score 2 · Answer 3 · answered Jan 31 '17 at 22:34

2

You can also use where as you described,

condition = df.Conflict != 'Match Conflict'
df['Words'] = df.Words.where(condition, None)

         Conflict                  Words
0            None   {u'Sec': [u'level']}
1            None  {u'Sec': [u'levels']}
2            None   {u'Sec': [u'level']}
3            None     {u'Und': [u'ba ']}
4  Match Conflict                   None

answered Jan 31 '17 at 22:34

gold_cy

13,648
3
23
45

Many Thanks to you for your answer and input! What about implementing `where` with the function, rather than just `None` a la: `df['Words'] = df.Words.where(condition, #func())` What would the syntax of this look like? (I ask this, because this function is just a placeholder, the real one is much more substantial) – Chuck Jan 31 '17 at 22:37
The function would have to be modified probably, depending on what it is. – gold_cy Jan 31 '17 at 22:45

score 2 · Answer 4 · answered Jan 31 '17 at 22:38

2

suppose a placeholder

def func(x):
    x = x.clear()
    return x

Then we can use boolean indexing and apply to obtain the desired output.

df.ix[df['Conflict']=='Match Conflict', 'Words'].apply(func)

I wanted to provide a concise one-liner but I was too late :,(

answered Jan 31 '17 at 22:38

spicypumpkin

1,209
2
10
21

God I'm learning so much. Thank you for your input. Your answer led me to this question http://stackoverflow.com/questions/27667759/is-ix-always-better-than-loc-and-iloc-since-it-is-faster-and-supports-i and on and on. Thanks. – Chuck Jan 31 '17 at 22:41
Could you also have a function in place of `=='Match Conflict'` if you wanted to expand your criteria to something more stringent? – Chuck Jan 31 '17 at 22:43
I believe so. Boolean and callable indexing are detailed in the [documentation](http://pandas.pydata.org/pandas-docs/stable/indexing.html). I suppose you can write a function that returns a bool and use it like `df.ix[bool_func(df.A), 'B']`. I've never tried it myself, though. – spicypumpkin Jan 31 '17 at 22:50

How to apply a function to a column in Pandas depending on the value in another column?

4 Answers4

Linked