Generating word frequencies during datacleaning

Question

I have been trying to remove stopwords from the data but they are still present in the output using this code:

def gen_freq(text):
    word_list=[] #stores the list of words
        
    for words in text.split(): #Loop over all the reviews and extract words into word_list
        word_list.extend(words)

    word_freq=pd.Series(word_list).value_counts() #Create word frequencies using word_list

    word_freq[:20]

     #Print top 20 words
    print(word_freq)
      

gen_freq(dataset.text.str)
 
text = dataset.text.apply(lambda x: clean_text(x))
word_freq = gen_freq(text.str)*100
word_freq = word_freq.drop(labels=STOPWORDS, errors='ignore')

I obtained the data from:

with open('reviews.json') as project_file:    
    data = json.load(project_file)
dataset=pd.json_normalize(data) 
print(dataset.head())

I get the error: unsupported operand type(s) for *: 'NoneType' and 'int' in the code:

word_freq = gen_freq(text.str)*100

and the stopwords are not removed. Kindly help.

That's because your `gen_freq` function does not return anything. Maybe you're missing this at the end? `return word_freq[:20]` — aaossa, Mar 17 '22 at 19:57
Does this answer your question? [Why is the output of my function printing out "None"?](https://stackoverflow.com/questions/7053652/why-is-the-output-of-my-function-printing-out-none) — JonSG, Mar 17 '22 at 20:01

Generating word frequencies during datacleaning

0 Answers0