this is the sample of my Pandas data frame, containing 30,000 rows [excluding column headers]. The expression comes with two classes, mainly Sad and Happy.
Expression Description
Sad "people are sad because they got no money."
Happy "people are happy because ..."
Sad "people are miserable because they broke up"
Happy "They got good money"
Based on the example above, I would like to count the number of frequencies, which allows me to the number of word occurrences of "Sad" and "Happy" Expression's description in a dictionary. e.g. {sad:{people:2}, happy:{happy:1}}
This is my code:
def calculate_word_frequency(lst, classes):
#variable
wordlist = []
dict_output = {}
count = 0
term = ""
data = [lst.columns.values.tolist()] + lst.values.tolist() #to convert into a list
for i in range(1,len(data)):
if data[i][0] == classes[0]:
wordlist = data[i][1].lower().split(" ")
for words in wordlist:
wordlist.append(words)
for word in wordlist:
if word in dict_output:
dict_output[wordlist] += 1
else:
dict_output[wordlist] == 1
print(dict_output)
Expected output would be based on the number of words appearing in each Expression respectively.
#Test case:
words, freqs_per_expression = calculate_word_frequency(social_df, ["Sad", "Happy"])
#output: 538212
print(freqs_per_class["sad"]["people"]) #output: 203
Because of the dataset, I often face frequent hangs and lags on my VS. Hence, I am unable to retrieve any results. I wondered if there are any better techniques that I can utilise so that I can achieve my desired data of {word:count}.
Thank you!