I have been trying to remove stopwords from the data but they are still present in the output using this code:
def gen_freq(text):
word_list=[] #stores the list of words
for words in text.split(): #Loop over all the reviews and extract words into word_list
word_list.extend(words)
word_freq=pd.Series(word_list).value_counts() #Create word frequencies using word_list
word_freq[:20]
#Print top 20 words
print(word_freq)
gen_freq(dataset.text.str)
text = dataset.text.apply(lambda x: clean_text(x))
word_freq = gen_freq(text.str)*100
word_freq = word_freq.drop(labels=STOPWORDS, errors='ignore')
I obtained the data from:
with open('reviews.json') as project_file:
data = json.load(project_file)
dataset=pd.json_normalize(data)
print(dataset.head())
I get the error:
unsupported operand type(s) for *: 'NoneType' and 'int'
in the code:
word_freq = gen_freq(text.str)*100
and the stopwords are not removed. Kindly help.