0

I have two dataframes. One consists of a series of processed excerpts (split, stopwords and punctuation removed) and the other consists of a corpus of words and a corresponding 'frequency' score.

I am trying to obtain an 'average frequency score' for each excerpt in my dataframe. To do this I want a function or loop that takes each word in each excerpt of my dataframe and match it to the frequency score given in the corpus dataframe, and then sums and finds the average of these scores. I am having trouble doing this. My code so far:

def average_frequency_score(text):
    for word in text:
        text_freq = []
        if word == word_freq_df[word_freq_df['words'][i]]:
            freq = word_freq_df['frequency'][i]
            text_freq.append(freq)
        else:
            freq = 9.0
            text_freq.append(freq)

df['frequencies'] = df['fully_processed'].apply(average_frequency_score)

excerpt =['roger','predicted','snow','departed','quickly','came','two','days','sleigh','ride','scarcely','vestige','white','ground','tennis','possible','great', 'game','progress','court','pine', 'laurel','patty', 'roger', 'playing', 'elise']


word_freq_df[1:5] 


   words    frequency
1   home    20.9677
2   us      20.9296
3   page    20.8022
4   search  20.7471


I would then apply another function to obtain the average. Above I am trying to use the index [i] in the word_freq_df to identify the correct frequency but the error is saying that i is not defined. Can anyone help me with this?!

kl999
  • 1
  • 1

0 Answers0