Extracting data from a pandas dataframe on string matching

Question

I have two dataframes. One consists of a series of processed excerpts (split, stopwords and punctuation removed) and the other consists of a corpus of words and a corresponding 'frequency' score.

I am trying to obtain an 'average frequency score' for each excerpt in my dataframe. To do this I want a function or loop that takes each word in each excerpt of my dataframe and match it to the frequency score given in the corpus dataframe, and then sums and finds the average of these scores. I am having trouble doing this. My code so far:

def average_frequency_score(text):
    for word in text:
        text_freq = []
        if word == word_freq_df[word_freq_df['words'][i]]:
            freq = word_freq_df['frequency'][i]
            text_freq.append(freq)
        else:
            freq = 9.0
            text_freq.append(freq)

df['frequencies'] = df['fully_processed'].apply(average_frequency_score)

excerpt =['roger','predicted','snow','departed','quickly','came','two','days','sleigh','ride','scarcely','vestige','white','ground','tennis','possible','great', 'game','progress','court','pine', 'laurel','patty', 'roger', 'playing', 'elise']


word_freq_df[1:5] 


   words    frequency
1   home    20.9677
2   us      20.9296
3   page    20.8022
4   search  20.7471

I would then apply another function to obtain the average. Above I am trying to use the index [i] in the word_freq_df to identify the correct frequency but the error is saying that i is not defined. Can anyone help me with this?!

You should provide a [mcve] that includes example data and expected output. — Alex, May 28 '21 at 17:12
It's not clearer, is `excerpt` an example row from `df["fully_processed"]`? where does the variable `i` come from in your function? This isn't a [mcve] as it will not run as is. — Alex, May 28 '21 at 18:50
See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391) for tips. — AlexK, May 28 '21 at 18:59

Extracting data from a pandas dataframe on string matching

0 Answers0