3

Thank you in advance for reading.

I have a dataframe:

df = pd.DataFrame({'Words':[{'Sec': ['level']},{'Sec': ['levels']},{'Sec': ['level']},{'Und': ['ba ']},{'Pro': ['conf'],'ProAbb': ['cth']}],'Conflict':[None,None,None,None,'Match Conflict']})


         Conflict                                     Words
0            None                      {u'Sec': [u'level']}
1            None                     {u'Sec': [u'levels']}
2            None                      {u'Sec': [u'level']}
3            None                        {u'Und': [u'ba ']}
4  Match Conflict  {u'ProAbb': [u'cth'], u'Pro': [u'conf']}

I want to apply a routine that, for each element in 'Words', checks if Conflict = 'Match Conflict' and if so, applies some function to the value in 'Words'.

For instance, using the following placeholder function:

def func(x):
    x = x.clear()
    return x

I write:

df['Words'] = df[df['Conflict'] == 'Match Conflict']['Words'].apply(lambda x: func(x))

My expected output is:

         Conflict                                     Words
0            None                      {u'Sec': [u'level']}
1            None                     {u'Sec': [u'levels']}
2            None                      {u'Sec': [u'level']}
3            None                        {u'Und': [u'ba ']}
4  Match Conflict                                        None

Instead I get:

         Conflict Words
0            None   NaN
1            None   NaN
2            None   NaN
3            None   NaN
4  Match Conflict  None

The function is applied only to the row which has Conflict = 'Match Conflict' but at the expense of the other rows (which all become None. I assumed the other rows would be left untouched; obviously this is not the case.

Can you explain how I might achieve my desired output without dropping all of the information in the Words column? I believe the answer may lie with np.where but I have not been able to make this work, this was the best I could come up with.

Any help much appreciated. Thanks.

Chuck
  • 3,664
  • 7
  • 42
  • 76
  • `df['Words'] = #anything` overwrites the `words` column. So this is behaving exactly as you asked it to. – Paul H Jan 31 '17 at 22:21
  • @PaulH Appreciate the feedback. I tried to apply what little I knew and this was as far as it got me. I am glad for your and Psidom 's assistance. – Chuck Jan 31 '17 at 22:30

4 Answers4

4

You can try to update only those rows that match the condition using .loc:

df.loc[df['Conflict'] == 'Match Conflict', 'Words'] = df.loc[df['Conflict'] == 'Match Conflict', 'Words'].apply(lambda x: func(x))

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thank you very much for your help. I can adapt this and @Paul H 's answer to do everything I need. Really appreciate it. – Chuck Jan 31 '17 at 22:32
  • Just wanted to come back and say thanks again. I implemented this in production today and it has resolved an entire afternoons worth of troubles. Thanks – Chuck Feb 01 '17 at 09:49
3

You should rewrite the function to work with all of your rows:

def func(x, match):
    if x['Conflict'] == match:
        return None
    return x['Words']

df['Words'] = df.apply(lambda row: func(row, 'Match Conflict'), axis=1)
Paul H
  • 65,268
  • 20
  • 159
  • 136
  • Thank you very much for your help Paul :) This is extremely useful. I was pulling my hair out all day. – Chuck Jan 31 '17 at 22:32
2

You can also use where as you described,

condition = df.Conflict != 'Match Conflict'
df['Words'] = df.Words.where(condition, None)

         Conflict                  Words
0            None   {u'Sec': [u'level']}
1            None  {u'Sec': [u'levels']}
2            None   {u'Sec': [u'level']}
3            None     {u'Und': [u'ba ']}
4  Match Conflict                   None
gold_cy
  • 13,648
  • 3
  • 23
  • 45
  • Many Thanks to you for your answer and input! What about implementing `where` with the function, rather than just `None` a la: `df['Words'] = df.Words.where(condition, #func())` What would the syntax of this look like? (I ask this, because this function is just a placeholder, the real one is much more substantial) – Chuck Jan 31 '17 at 22:37
  • The function would have to be modified probably, depending on what it is. – gold_cy Jan 31 '17 at 22:45
2

suppose a placeholder

def func(x):
    x = x.clear()
    return x

Then we can use boolean indexing and apply to obtain the desired output.

df.ix[df['Conflict']=='Match Conflict', 'Words'].apply(func)

I wanted to provide a concise one-liner but I was too late :,(

spicypumpkin
  • 1,209
  • 2
  • 10
  • 21
  • God I'm learning so much. Thank you for your input. Your answer led me to this question http://stackoverflow.com/questions/27667759/is-ix-always-better-than-loc-and-iloc-since-it-is-faster-and-supports-i and on and on. Thanks. – Chuck Jan 31 '17 at 22:41
  • Could you also have a function in place of `=='Match Conflict'` if you wanted to expand your criteria to something more stringent? – Chuck Jan 31 '17 at 22:43
  • I believe so. Boolean and callable indexing are detailed in the [documentation](http://pandas.pydata.org/pandas-docs/stable/indexing.html). I suppose you can write a function that returns a bool and use it like `df.ix[bool_func(df.A), 'B']`. I've never tried it myself, though. – spicypumpkin Jan 31 '17 at 22:50