3

Im trying to replace certain values in a pandas column (dataframe) using regex, but I want to apply the regex based on values in another column.

A basic example;

index  col1  col2
1      yes   foobar
2      yes   foo
3      no    foobar

Using the following;

df.loc[df['col1'] == 'yes', 'col2'].replace({r'(fo)o(?!bar)' :r'\1'}, inplace=True, regex=True)

I expected the following result;

index  col1  col2
1      yes   foobar
2      yes   fo
3      no    foobar

However it doesn't seem to be working? It doesn't throw any errors or a settingwithcopy warning, it just does nothing. Is there an alternative way to do this?

Nordle
  • 2,915
  • 3
  • 16
  • 34

2 Answers2

4

For avoid chained assignments assign back and remove inplace=True:

mask = df['col1'] == 'yes'
df.loc[mask, 'col2'] = df.loc[mask, 'col2'].replace({r'(fo)o(?!bar)' :r'\1'}, regex=True)

print (df)
  col1    col2
1  yes  foobar
2  yes      fo
3   no  foobar
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you jez, as always. Will accept when it allows me – Nordle Aug 30 '18 at 07:07
  • 1
    Is there any reason to be using the `DataFrame.replace` method @Nordle? It seems that `df['col2'] = df.col2[df.col1 == 'yes'].str.replace('(fo)o(?!bar)', r'\1')` would suit your purposes here... – Jon Clements Aug 30 '18 at 07:38
  • @JonClements what's the difference between the 2 Jon? But no there is no particular reason I'm using `df.replace` – Nordle Aug 30 '18 at 07:54
  • 1
    @Nordle `df.replace` can be used to replace multiple columns at once with regexes or just translate one value to another using mappings... if you're just replacing a string in a single column, it's better to use `Series.str.replace` as above... jezrael may offer his opinion on that, but `df.replace` is a sledge hammer to crack a nut here... – Jon Clements Aug 30 '18 at 08:07
  • 1
    (and would be correctly be written along the lines of `df.replace({'col2': '(fo)o(?!bar)'}, r'\1', regex=True)` - might be an idea to read the docs for both and note the differences. – Jon Clements Aug 30 '18 at 08:08
  • @JonClements thank you for the info, I've had a look at yes it seems using replace is overkill here. I've switched to a series replacement based on the suggestion, thanks! – Nordle Aug 30 '18 at 08:49
1

Using np.where:

df.assign(
    col2=np.where(df.col1.eq('yes'), df.col2.str.replace(r'(fo)o(?!bar)', r'\1'), df.col2)
)

  col1    col2
1  yes  foobar
2  yes      fo
3   no  foobar
user3483203
  • 50,081
  • 9
  • 65
  • 94