Item frequency count in Python

Question

Assume I have a list of words, and I want to find the number of times each word appears in that list.

An obvious way to do this is:

words = "apple banana apple strawberry banana lemon"
uniques = set(words.split())
freqs = [(item, words.split().count(item)) for item in uniques]
print(freqs)

But I find this code not very good, because the program runs through the word list twice, once to build the set, and a second time to count the number of appearances.

Of course, I could write a function to run through the list and do the counting, but that wouldn't be so Pythonic. So, is there a more efficient and Pythonic way?

You may be interested in: http://stackoverflow.com/a/20308657/2534876 for issues of performance. — JDong, Dec 31 '14 at 05:31

score 150 · Accepted Answer · edited Apr 25 '19 at 21:53

150

The Counter class in the collections module is purpose built to solve this type of problem:

from collections import Counter
words = "apple banana apple strawberry banana lemon"
Counter(words.split())
# Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})

edited Apr 25 '19 at 21:53

Boris Verkhovskiy

14,854
11
100
103

answered May 21 '09 at 15:16

sykora

96,888
11
64
71

According to http://stackoverflow.com/a/20308657/2534876, this is fastest on Python3 but slow on Python2. – JDong Dec 31 '14 at 05:34
do you know if there is a flag to convert this to a percentage freq_dict? E.g., `'apple' : .3333 (2/6),` – Tommy Sep 23 '15 at 13:30
@Tommy `total = sum(your_counter_object.values())` then `freq_percentage = {k: v/total for k, v in your_counter_object.items()}` – Boris Verkhovskiy Apr 25 '19 at 03:00

score 95 · Answer 2 · answered May 21 '09 at 15:10

95

defaultdict to the rescue!

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"

d = defaultdict(int)
for word in words.split():
    d[word] += 1

This runs in O(n).

answered May 21 '09 at 15:10

Kenan Banks

207,056
34
155
173

3

This is a very old answer. Use `Counter` instead. – Boris Verkhovskiy Apr 25 '19 at 20:52

hopla · Answer 3 · 2009-06-11T20:32:08.847

freqs = {}
for word in words:
    freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

I think this results to the same as Triptych's solution, but without importing collections. Also a bit like Selinap's solution, but more readable imho. Almost identical to Thomas Weigel's solution, but without using Exceptions.

This could be slower than using defaultdict() from the collections library however. Since the value is fetched, incremented and then assigned again. Instead of just incremented. However using += might do just the same internally.

score 11 · Answer 4 · edited Oct 08 '19 at 06:37

11

Standard approach:

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"
words = words.split()
result = defaultdict(int)
for word in words:
    result[word] += 1

print result

Groupby oneliner:

from itertools import groupby

words = "apple banana apple strawberry banana lemon"
words = words.split()

result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result

edited Oct 08 '19 at 06:37

Community

1
1

answered May 21 '09 at 15:11

nosklo

217,122
57
293
297

Is there a difference in complexity? Does groupby use sorting? Then it seems to need O(nlogn) time? – Daniyar May 21 '09 at 15:27
Oops, it seems Nick Presta below has pointed out that the groupby approach uses O(nlogn). – Daniyar May 21 '09 at 15:35

score 7 · Answer 5 · answered May 21 '09 at 15:09

If you don't want to use the standard dictionary method (looping through the list incrementing the proper dict. key), you can try this:

>>> from itertools import groupby
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

It runs in O(n log n) time.

score 3 · Answer 6 · edited May 21 '09 at 22:36

3

Without defaultdict:

words = "apple banana apple strawberry banana lemon"
my_count = {}
for word in words.split():
    try: my_count[word] += 1
    except KeyError: my_count[word] = 1

edited May 21 '09 at 22:36

tzot

92,761
29
141
204

answered May 21 '09 at 15:59

Thomas Weigel

159
1
2

Seems slower than defaultdict in my tests – nosklo May 21 '09 at 16:59
splitting by a space is redundant. Also, you should use the dict.set_default method instead of the try/except. – Kenan Banks May 21 '09 at 17:05
2

It's a lot slower because you are using Exceptions. Exceptions are very costly in almost any language. Avoid using them for logic branches. Look at my solution for an almost identical method, but without using Exceptions: http://stackoverflow.com/questions/893417/item-frequency-count-in-python/983434#983434 – hopla Jun 11 '09 at 20:30

score 2 · Answer 7 · answered Aug 07 '21 at 07:10

2

user_input = list(input().split(' '))

for word in user_input:

    print('{} {}'.format(word, user_input.count(word)))

answered Aug 07 '21 at 07:10

dB_19

21
2

score 1 · Answer 8 · edited Nov 16 '19 at 14:04

1

words = "apple banana apple strawberry banana lemon"
w=words.split()
e=list(set(w))       
word_freqs = {}
for i in e:
    word_freqs[i]=w.count(i)
print(word_freqs)

Hope this helps!

edited Nov 16 '19 at 14:04

user2922935

439
4
12

answered Nov 12 '17 at 16:17

Varun Shaandhesh

79
1
10

score 0 · Answer 9 · edited Jun 26 '15 at 06:56

I happened to work on some Spark exercise, here is my solution.

tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

**#output of the above **

{'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}

score 0 · Answer 10 · edited Oct 15 '21 at 10:03

0

Use reduce() to convert the list to a single dict.

from functools import reduce

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

edited Oct 15 '21 at 10:03

theherk

6,954
3
27
52

answered Feb 23 '16 at 18:03

Gadi

1,152
9
6

score 0 · Answer 11 · answered Apr 07 '11 at 05:36

0

Can't you just use count?

words = 'the quick brown fox jumps over the lazy gray dog'
words.count('z')
#output: 1

answered Apr 07 '11 at 05:36

Antonio

11

1

The question already uses "count", and asks for better alternatives. – Daniyar May 22 '11 at 21:15

score 0 · Answer 12 · answered Oct 11 '21 at 01:11

0

list = input()  # Providing user input passes multiple tests
text = list.split()

for word in text:
    freq = text.count(word) 
    print(word, freq)

answered Oct 11 '21 at 01:11

PanamaPHat

51
3

score 0 · Answer 13 · answered Apr 17 '22 at 01:35

I had a similar assignment on Zybook, this is the solution that worked for me.

def build_dictionary(words):
    counts = dict()
    for word in words:
        if word in counts:
             counts[word] += 1
        else:
             counts = 1
    return counts
if __name__ == '__main__':
    words = input().split()
    your_dictionary = build_dictionary(words)
    sorted_keys = sorted(your_dictionary.keys())
    for key in sorted_keys:
        print(key + ':' + str(your_dictionary[key]))

score -1 · Answer 14 · edited Feb 27 '13 at 02:38

The answer below takes some extra cycles, but it is another method

def func(tup):
    return tup[-1]


def print_words(filename):
    f = open("small.txt",'r')
    whole_content = (f.read()).lower()
    print whole_content
    list_content = whole_content.split()
    dict = {}
    for one_word in list_content:
        dict[one_word] = 0
    for one_word in list_content:
        dict[one_word] += 1
    print dict.items()
    print sorted(dict.items(),key=func)

Item frequency count in Python

14 Answers14

Linked

Related