I have to write a script that will give me all content words in decending order of frequency. I need the 10 most frequent content words, I thus not only need to make a list of the 10 most frequent words of my corpus, I will also need to filter out any content words (and, or, any punctuation...). What I have so far is the following
fileids=corpus.fileids ()
text=corpus.words(fileids)
wlist=[]
ftable=nltk.FreqDist (text)
wlist.append(ftable.keys () )
This gives me a very neat list of all words in decending order of frequency, but how do I filter the function words out?
Thank you.