0

I have a pandas.Series des, which contains all the text:

enter image description here

I want to remove all punctuation, so I did the following:

for i in range(len(des)):
for ch in punc:
    if ch in des[i]:
        des[i] = des[i].replace(ch, "", inplace=True)

However, I got a "TypeError: replace() takes no keyword arguments".

How can I fix it? Also, is there any more efficient way to remove punctuation for all rows of text in a series?

ernest_k
  • 44,416
  • 5
  • 53
  • 99
Eve
  • 3
  • 2

3 Answers3

0

you can create a dictionary of items to replace and use it in pandas.DataFrame.replace

# create a dictionary 'mydict'
mydict = {item:"" for item in punc}

# replace the column 'FullDescription` in your data frame 'des', using the created dictionary

des = des.replace({"FullDescription": mydict},regex=True)

#if des is a series use
des = des.replace(mydict,regex=True)

return of inplace=True is None (ie, no need to assign back when using inplace replacement)

#Using inplace for dataframe
des.replace({"FullDescription": mydict},regex=True,inplace=True)
#Using inplace for dataseries
des.replace(mydict,regex=True,inplace=True)
Shijith
  • 4,602
  • 2
  • 20
  • 34
0

First of all, you have an indentation error in the inner for loop. Secondly, considering proper indentations, the problem behind your problem is that .replace() method for dataframe and for strings have different function signatures. What you want to do is, to use the dataframe's replace() method, but you are using the string's replace() method. You can check this here : https://stackoverflow.com/a/50843478/9851541 Or, you can also check How to use the replace() method with keyword arguments to replace empty strings for your problem. Hope this helps!

Swati Srivastava
  • 1,102
  • 1
  • 12
  • 18
0

My interpretation of your question might be incorrect, but if you are cycling through a list of punctuation characters in punc and you want to just remove all of them while keeping the rest of the text, I think you can do something simpler like the following:

for ch in punc:
    des = des.str.replace(ch, "")

As you probably know, replace is the standard python string method to replace one series of characters with another. E.g.:

'abc'.replace('b', 'z')

returns 'azc'

When you use Series.str.replace() you are using that same string replace method, but now it will be applied to every element in the Series. AFAIK, all string methods can be applied element wise to a series using this same syntax Series.str.some_string_method()

Brett Romero
  • 343
  • 2
  • 12
  • Also, do you mind explaining a bit why it works with Pandas.Series? Why python can interpret it as every element in each row? Because it doesn't work if I tokenize each row, i.e. if each row becomes a list... – Eve Jan 27 '20 at 15:34
  • Updated the answer with some explanation. Please don't forget to mark it correct if you are happy with my answer! – Brett Romero Jan 29 '20 at 05:33