1

I have a pandas DataFrame with data scraped from a couple Wiki tables. The DataFrame has a column for names and some of these names are followed by "\r\n(head coach)". I would like to remove that and so I tried this:

df['name'][df.name.str.contains(r'coach')] =\
df['name'][df.name.str.contains(r'coach')].apply(lambda x: x[0:-14])

When this runs, I get a SettingWithCopyWarning. I tried using .loc as suggested in this SO Q&A:

 mask = df.loc[:,'name'] == df['name'].str.contains(r'coach')

But every value returns as False and so I get an empty Series when I use this with my DataFrame.

I'm not sure where I am going wrong with this. Any pointers?

Community
  • 1
  • 1
Bryan Stafford
  • 301
  • 2
  • 12

1 Answers1

3

You can try this:

mask = df.name.str.contains(r'coach')]
df.loc[mask, 'name'] = df.loc[mask, 'name'].str[:-14]

Or as @piRSquared commented, this simple line should also work:

df.loc[mask, 'name'] = df.name.str[:-14]
Psidom
  • 209,562
  • 33
  • 339
  • 356