Essentially, nltk.probability.FreqDist
is a collections.Counter
object (https://github.com/nltk/nltk/blob/develop/nltk/probability.py#L61). Given a dictionary object, there are several way to filter it:
1. Read into a FreqDist and filter it with a lambda function
>>> import nltk
>>> text = "Hello, this is my sentence. It is a very basic sentence with not much information in it"
>>> tokenized_text = nltk.word_tokenize(text)
>>> stopwords = nltk.corpus.stopwords.words('english')
>>> word_freq = nltk.FreqDist(tokenized_text)
>>> dict_filter = lambda word_freq, stopwords: dict( (word,word_freq[word]) for word in word_freq if word not in stopwords )
>>> filtered_word_freq = dict_filter(word_freq, stopwords)
>>> len(word_freq)
17
>>> len(filtered_word_freq)
8
>>> word_freq
FreqDist({'sentence': 2, 'is': 2, 'a': 1, 'information': 1, 'this': 1, 'with': 1, 'in': 1, ',': 1, '.': 1, 'very': 1, ...})
>>> filtered_word_freq
{'information': 1, 'sentence': 2, ',': 1, '.': 1, 'much': 1, 'basic': 1, 'It': 1, 'Hello': 1}
2. Read into a FreqDist and filter it with dictionary comprehension
>>> word_freq
FreqDist({'sentence': 2, 'is': 2, 'a': 1, 'information': 1, 'this': 1, 'with': 1, 'in': 1, ',': 1, '.': 1, 'very': 1, ...})
>>> filtered_word_freq = dict((word, freq) for word, freq in word_freq.items() if word not in stopwords)
>>> filtered_word_freq
{'information': 1, 'sentence': 2, ',': 1, '.': 1, 'much': 1, 'basic': 1, 'It': 1, 'Hello': 1}
3. Filter the words before reading into a FreqDist
>>> import nltk
>>> text = "Hello, this is my sentence. It is a very basic sentence with not much information in it"
>>> tokenized_text = nltk.word_tokenize(text)
>>> stopwords = nltk.corpus.stopwords.words('english')
>>> filtered_tokenized_text = [word for word in tokenized_text if word not in stopwords]
>>> filtered_word_freq = nltk.FreqDist(filtered_tokenized_text)
>>> filtered_word_freq
FreqDist({'sentence': 2, 'information': 1, ',': 1, 'It': 1, '.': 1, 'much': 1, 'basic': 1, 'Hello': 1})