NLTK - how to get items where the frequency distribution is greater than a specific number

Question

I'm trying to get the frequency distribution for a list if it's over a certain number.

Example:

import nltk
test_list=['aa', 'aa', 'bb', 'cc', 'dd', 'dd']
test_fd = nltk.FreqDist(test_list)

Returns:

FreqDist({'aa': 2, 'dd': 2, 'bb': 1, 'cc': 1})

Without a loop, I am looking for all the items greater than 1.

Using Python 3.8 and NLTK 3.5

You would need to use a loop, even if you do come across a solution which 'doesn't use loop' technically would internally use a loop. Use something like [this](https://stackoverflow.com/a/40555781/8661686). — agupta, Oct 29 '20 at 14:09
Does this answer your question? [Finding frequency distribution of a list of numbers in python](https://stackoverflow.com/questions/40553332/finding-frequency-distribution-of-a-list-of-numbers-in-python) — agupta, Oct 29 '20 at 14:11

score 0 · Answer 1 · answered Oct 29 '20 at 14:02

0

Here is a possible solution:

test_fd = nltk.FreqDist({k: v for k, v in test_fd.items() if v > 1})

answered Oct 29 '20 at 14:02

Riccardo Bucco

Luca Massaron · Answer 2 · 2020-10-29T18:35:26.800

0

It can be done with filter and you can decide to have as output a dict or a list (of tuples):

test_fd = dict(filter(lambda x: x[1] > 1, nltk.FreqDist(test_list).items()))

edited Oct 29 '20 at 18:35

answered Oct 29 '20 at 18:24

Luca Massaron

2 Answers2