0

I want my code to perform semantic analysis and create a csv table:

from collections import Counter
import pandas as pd


stoplist = ['.', 'and', 'was', 'in', 'a', 'the', ',', '?', ':', 'of']
text1 = str(input("Paste text here: "))

words1 = [s.lower() for s in text1.split() if s.lower() not in stoplist]
data = {'quantity': words1}
df = pd.DataFrame(data)
df = df['quantity'].value_counts()
df.to_csv('seo.csv')

Stoplist works for words, however it does not for punctuation: enter image description here

Many people suggested using .str.replace(r'[^\w\s]+', ''), but it doesn't work here:

AttributeError: Can only use .str accessor with string values!
yegor
  • 17
  • 2
  • 6
  • Sorry, need remove punctation from text, not from column. So changed dupe link. – jezrael Apr 27 '22 at 11:34
  • have you checked the data type? and what data type it is? – dimas krisrianto Apr 27 '22 at 11:34
  • cast it first, .astype(str).str.replace() – nfn Apr 27 '22 at 11:34
  • @nfn so i changed df = df['quantity'].value_counts() into df = df['quantity'].value_counts().astype(str).str.replace(stoplist, '') and now i get an error unhashable type: 'list' – yegor Apr 27 '22 at 12:01
  • @dimaskrisrianto it's Series – yegor Apr 27 '22 at 12:03
  • @yegor I mean what is the data type of that `Series`. as @nfn mentioned, you can convert every value inside `Series` into a string with `astype('str')` then use `str.replace()` function to replace or remove any character inside every value inside `Series`. unless you have unconvertible data type mixed in that `Series` – dimas krisrianto Apr 28 '22 at 11:10

0 Answers0