0

Sample of data I'm running the replace method on

I'd like to df['series'].replace('-','') on a series in a pandas dataframe, but nothing happens when I run the method. I assume it has to do with the fact that a dash is an operator(might be using that term incorrectly). I've tried playing around with Regex, but can't find a solution. (FYI DataType of the column is Object)

df['series'] = df['series'].str.replace('-','')
df['series'] = df['series'].str.replace(r'-','')
df['series'] = df['series'].str.replace('\-','')
df['series'] = df['series'].replace('-','')
df['series'] = df['series'].replace(r'-','')
df['series'] = df['series'].replace('\-','')

And all of the above with Regex set to False

mcdelta
  • 21
  • 3
  • Did you try it with any characters that aren't dashes? – Neil May 23 '19 at 19:54
  • Please provide a sample of your data that we can test against? I'm unable to reproduce this issue. – G. Anderson May 23 '19 at 19:56
  • @Neil I did try it with characters that aren't dashes – mcdelta May 23 '19 at 20:38
  • @mcdelta and? Did it work in those cases? Please edit your post to include input and output examples! We need to reproduce the problem as G. Anderson says... – Neil May 23 '19 at 20:45
  • @Neil I just added a picture of the column of the series data that I'm trying to manipulate. Is there another way that I can provide more detail? I don't get an error when I run the above or piRSquared's code below. It just doesn't change the series. – mcdelta May 23 '19 at 20:49
  • Thanks @mcdelta From this it looks like your problem is that the data type is not string. It says there "dtype: object". So those dates you have, are they objects? Where did you get them from? Replace will only work if dtype is string. Like in the example answer below. This is also why I asked what is the output when you use something that isn't a dash. What happens if you try replace '7' with ''? Because if that also doesn't work, then you know the issue is definitely with the data type. – Neil May 23 '19 at 20:52
  • Actually, I might be wrong about that. Seems numpy can be weird: https://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object – Neil May 23 '19 at 20:56
  • @Neil - replace() works with pd series' that are dtype: object – mcdelta May 23 '19 at 20:58

1 Answers1

2

Setup

These are not normal dashes chr(45). They are chr(8211)

df = pd.DataFrame(dict(series=['hi–hi', 'ho_ho', 'hidy–ho', 'oh–no']))

  • pandas.Series.str.replace will utilize regex by default
  • pandas.Series.replace will not utilize regex by default

The need for regex=True is that it enables the replace to match a portion of the string. Otherwise, it only matches on the entire string.

This works for me

df['series2'] = df['series'].replace(chr(8211), '', regex=True)
df

    series series2
0    hi–hi    hihi
1    ho_ho   ho_ho
2  hidy–ho  hidyho
3    oh–no    ohno

As does

df['series3'] = df['series'].str.replace(chr(8211), '')
df

    series series2 series3
0    hi–hi    hihi    hihi
1    ho_ho   ho_ho   ho_ho
2  hidy–ho  hidyho  hidyho
3    oh–no    ohno    ohno

Or

df['series4'] = [s.replace(chr(8211), '') for s in df['series']]
df

    series series2 series3 series4
0    hi–hi    hihi    hihi    hihi
1    ho_ho   ho_ho   ho_ho   ho_ho
2  hidy–ho  hidyho  hidyho  hidyho
3    oh–no    ohno    ohno    ohno
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks piRSquared, but unfortunately none of these worked for me. They worked when I did your example, but in the example I have, my series is the following [(2017), (2017-), (2017-2019)], and your methods did not solve the issue. – mcdelta May 23 '19 at 20:36
  • @mcdelta please clarify if your data is strings or not? How did you create your series? Because (2017-) as in your comment above, is not valid python. replace will only work on strings... – Neil May 23 '19 at 20:43
  • @Neil the data in each cell of the series are indeed strings – mcdelta May 23 '19 at 20:50
  • @mcdelta then copy paste the output of `df['Releas1'].to_dict()` into your post instead of the image you have now. – piRSquared May 23 '19 at 20:52
  • @piRSquared the output is {0: '(2017)', 1: '(2017)', 2: '(2017)', 3: '(2017)', ...37: '(2017)', 38: '(2017–2019)', 39: '(2017)', 40: '(2017)', 41: '(2017)', 42: '(2017)', 43: '(2017)', 44: '(2017–)', 45: '(2017–)', 46: '(2017)', 47: '(2017)', 48: '(2017–)', 49: '(2017)'} – mcdelta May 23 '19 at 20:55
  • That isn't a normal dash. try `s.str.replace(chr(8211), '')` – piRSquared May 23 '19 at 21:00