How to remove punctuation in pandas table without converting it into string? (str.replace doesn't work here)

Asked Apr 27 '22 at 11:31

Active Apr 27 '22 at 11:59

Viewed 28 times

I want my code to perform semantic analysis and create a csv table:

from collections import Counter
import pandas as pd


stoplist = ['.', 'and', 'was', 'in', 'a', 'the', ',', '?', ':', 'of']
text1 = str(input("Paste text here: "))

words1 = [s.lower() for s in text1.split() if s.lower() not in stoplist]
data = {'quantity': words1}
df = pd.DataFrame(data)
df = df['quantity'].value_counts()
df.to_csv('seo.csv')

Stoplist works for words, however it does not for punctuation:

Many people suggested using .str.replace(r'[^\w\s]+', ''), but it doesn't work here:

AttributeError: Can only use .str accessor with string values!

edited Apr 27 '22 at 11:59

asked Apr 27 '22 at 11:31

yegor

Sorry, need remove punctation from text, not from column. So changed dupe link. – jezrael Apr 27 '22 at 11:34
have you checked the data type? and what data type it is? – dimas krisrianto Apr 27 '22 at 11:34
cast it first, .astype(str).str.replace() – nfn Apr 27 '22 at 11:34
@nfn so i changed df = df['quantity'].value_counts() into df = df['quantity'].value_counts().astype(str).str.replace(stoplist, '') and now i get an error unhashable type: 'list' – yegor Apr 27 '22 at 12:01
@dimaskrisrianto it's Series – yegor Apr 27 '22 at 12:03
@yegor I mean what is the data type of that `Series`. as @nfn mentioned, you can convert every value inside `Series` into a string with `astype('str')` then use `str.replace()` function to replace or remove any character inside every value inside `Series`. unless you have unconvertible data type mixed in that `Series` – dimas krisrianto Apr 28 '22 at 11:10

How to remove punctuation in pandas table without converting it into string? (str.replace doesn't work here)

0 Answers0