I want to replicate a measure of common words from a Paper in R.
They describe their procedure as follows: "To construct Common words,..., we first determine the relative frequency of all words occurring in all documents. We then calculate Common words as the average of this proportion for every word occurring in a given document. The higher the value of common words, the more ordinary is the documents’s language and thus the more readable it should be." (Loughran & McDonald 2014)
Can anybody help me with this? I work with corpus objects in order to make analysis with the text documents in R.
I have already computed the relative frequency of all words occurring in all documents as follows:
dfm_Notes_Summary <- dfm(tokens_Notes_Summary)
Summary_FreqStats_Notes <- textstat_frequency(dfm_Notes_Summary)
Summary_FreqStats_Notes$RelativeFreq <- Summary_FreqStats_Notes$frequency/sum(Summary_FreqStats_Notes$frequency)
-> I basically transformed the tokens object (tokens_Notes_Summary) into an dfm Object (dfm_Notes_Summary) and got the relative frequency of all words in all documents.
Now I struggle to calculate the average of this proportion for every word occurring in a given document.