1

I am trying to clean some data that I have. I want to replace ) with a blank space in specific rows which contain ) in a dataframe.

Eg - 1948)

I have identified those rows which have the extra string character ). I tried using str.replace but it didn't seem to work.


movies['Year'].str[-1]==')'.replace(')','')

The code seems to run but doesn't clean the data.

Dvyn Resh
  • 980
  • 1
  • 6
  • 14
vp_5614
  • 41
  • 8
  • Possible duplicate of [Python Pandas: How to replace a characters in a column of a dataframe?](https://stackoverflow.com/questions/28986489/python-pandas-how-to-replace-a-characters-in-a-column-of-a-dataframe) – FabioSpaghetti Jul 30 '19 at 07:14
  • The right hand side of your statement, `')'.replace(')','')`, always gets converted to an empty string, `''`. You are also using a double equal sign, so this is a comparison, not an assignment. So this expression just tests whether the last character of each entry is an empty string (which it can never be), and returns the result as a series (all `False`s). – Matthias Fripp Jul 30 '19 at 08:30
  • You could just use `movies['Year'] = movies['Year'].str.replace(')','')`. – Matthias Fripp Jul 30 '19 at 08:38
  • Yeah. Realised that the == only compares and does not assign. Modified the code for it to run. Thanks! – vp_5614 Aug 03 '19 at 17:51

2 Answers2

1

If the ')' only occur at the last, use pandas.Series.str.rstrip:

import pandas as pd

s = pd.Series(['1)', '2', '3)', '4', '5)'])
s.str.rstrip(')')

Output:

0    1
1    2
2    3
3    4
4    5
dtype: object
Chris
  • 29,127
  • 3
  • 28
  • 51
0

This works:

movies = pd.DataFrame({'a':[1,2,3,4,5,6],'year':['1998)', '1999', '2000)', '2001)', '2002', '2003)']})
movies['year'] = movies['year'].apply(lambda s: s.replace(')', ''))

Will replace all instances of ')' even if not at the end.

Kyle
  • 456
  • 1
  • 6
  • 21
  • I'm not sure why I was downvoted, perhaps it's not as efficient as other methods? It certainly works, in any case. – Kyle Aug 06 '19 at 09:04