Using a dictionary to count the items in a list

Question

Suppose I have a list of items, like:

['apple', 'red', 'apple', 'red', 'red', 'pear']

I want a dictionary that counts how many times each item appears in the list. So for the list above the result should be:

{'apple': 2, 'red': 3, 'pear': 1}

How can I do this simply in Python?

_{If you are only interested in counting instances of a single element in a list, see How do I count the occurrences of a list item?.}

you can get inspiration here: http://stackoverflow.com/questions/2870466/python-histogram-one-liner — mykhal, Aug 16 '10 at 19:23
http://stackoverflow.com/questions/13242103/how-to-compute-letter-frequency-in-a-string-using-pythons-built-in-map-and-reduc — Andrew Tonko, Aug 16 '15 at 08:47

score 377 · Answer 1 · edited Jul 18 '22 at 03:17

377

In 2.7 and 3.1, there is the special Counter (dict subclass) for this purpose.

>>> from collections import Counter
>>> Counter(['apple','red','apple','red','red','pear'])
Counter({'red': 3, 'apple': 2, 'pear': 1})

edited Jul 18 '22 at 03:17

Daniel Walker

6,380
5
22
45

answered Aug 16 '10 at 20:00

Odomontois

15,918
2
36
71

5

The official line, or rather standing joke, is that Guido has a time machine .. – Muhammad Alkarouri Aug 17 '10 at 00:04
23

@Glenn Maynard Counter is just an implementation of a **multiset** which is not an uncommon data structure IMO. In fact, C++ has an implementation in the STL called `std::multiset` (also `std::tr1::unordered_multiset`) so Guido is not alone in his opinion of its importance. – awesomo Oct 18 '11 at 03:07
11

@awesomo: No, it's not comparable to std::multiset. std::multiset allows storing multiple distinct but comparatively equal values, which is what makes it so useful. (For example, you can compare a list of locations by their temperature, and use a multiset to look up all locations at a specific temperature or temperature range, while getting the fast insertions of a set.) Counter merely counts repetitions; distinct values are lost. That's much less useful--it's nothing more than a wrapped dict. I question calling that a multiset at all. – Glenn Maynard Oct 18 '11 at 15:23
2

@GlennMaynard You're right, I overlooked the additional (extremely useful) features of std::multiset. – awesomo Oct 18 '11 at 16:11
1

This is the correct Pythonista way of doing it. Efficient. Most of the other solutions listed work, but are not scalable. Exponentially less efficient. Attend MIT OCW "Introduction to Algorithms" to find out why. – imbatman Jan 18 '18 at 10:17
6

Counting might be a narrow task, but one that is required very often. – Radio Controlled Mar 26 '19 at 08:36
For clarity: to get the output you want you need to cast it to a dict, like dict(Counter(['apple','red','apple','red','red','pear'])). – Chiel Dec 18 '22 at 21:18

mmmdreg · Answer 2 · 2013-05-22T06:41:38.840

326

I like:

counts = dict()
for i in items:
  counts[i] = counts.get(i, 0) + 1

.get allows you to specify a default value if the key does not exist.

edited May 22 '13 at 06:41

answered Jul 05 '11 at 12:44

mmmdreg

6,170
2
24
19

33

For those new to python. This answer is better in terms of time complexity. – curiousMonkey Apr 18 '16 at 05:07
1

This answer works even on a list of floating point numbers, where some of the numbers may be '0' – SherylHohman May 03 '17 at 05:12
5

This answer also does not require any extra imports. +1 – Hayden Holligan Jan 17 '19 at 18:39
1

I don't understand what does the +1 part does. Could someone explain? – Jonas Palačionis Apr 22 '20 at 14:43
@JonasPalačionis get(i,0) will assign 0 if i is not yet in the dict. So it will start with 0 and keep adding 1 to increment the counter – Algorithman Oct 09 '20 at 01:57
1

@JonasPalačionis: It increments the counter for that key, before assigning back to the value for that key. i.e. it's a histogram aka frequency-count. – Peter Cordes Jul 30 '22 at 04:13
It's true that this solution saves an `import`, and the comment on time complexity may also be valid. In most cases, however, these considerations are less important than readability and elegance, and the accepted `Counter` answer would be more appropriate. – Michael Scheper Nov 08 '22 at 15:31
so far best answer. – novice Apr 19 '23 at 14:36

score 75 · Answer 3 · edited Aug 21 '18 at 13:20

75

Simply use list property count\

i = ['apple','red','apple','red','red','pear']
d = {x:i.count(x) for x in i}
print d

output :

{'pear': 1, 'apple': 2, 'red': 3}

edited Aug 21 '18 at 13:20

JFMR

23,265
4
52
76

answered Mar 29 '16 at 12:24

Ashish Kumar Verma

1,322
1
13
21

33

You're applying `count` against the array as many times as there are array items. Your solution is `O(n^2)` where the better trivial solution is `O(n)`. See comments on [riviera's answer](https://stackoverflow.com/a/9604768/367865) versus comments on [mmdreg's answer](https://stackoverflow.com/a/6582852/367865). – Ouroborus Nov 29 '17 at 09:50
5

Maybe you could do `d = {x:i.count(x) for x in set(i)}` – Xenia Ioannidou Jul 29 '21 at 18:38
2

@XeniaIoannidou: That does `O(n * unique_elements)` work; not much better unless you have many repeats. And still bad; building a `set()` is basically adding elements to a hash table without a count. Almost as much work as just adding them to a Dictionary of counts and incrementing the count if already present, and that's just for making the set. What I described for adding to a Dictionary is already a full solution to the histogram problem, and you're done there without any time spent scanning the original array for each unique element. – Peter Cordes Jul 30 '22 at 04:11

score 63 · Answer 4 · answered Aug 16 '10 at 19:22

63

>>> L = ['apple','red','apple','red','red','pear']
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in L:
...   d[i] += 1
>>> d
defaultdict(<type 'int'>, {'pear': 1, 'apple': 2, 'red': 3})

answered Aug 16 '10 at 19:22

mechanical_meat

163,903
24
228
223

@NickT It's more cluttered than itertools.Counter - and I'd be surprised if it was faster... – Shadow Sep 12 '19 at 01:56
1

By `itertools.Counter` I think @Shadow meant `collections.Counter` – Intrastellar Explorer Jul 17 '22 at 19:14

score 32 · Answer 5 · answered Aug 17 '10 at 12:25

32

I always thought that for a task that trivial, I wouldn't want to import anything. But i may be wrong, depending on collections.Counter being faster or not.

items = "Whats the simpliest way to add the list items to a dictionary "

stats = {}
for i in items:
    if i in stats:
        stats[i] += 1
    else:
        stats[i] = 1

# bonus
for i in sorted(stats, key=stats.get):
    print("%d×'%s'" % (stats[i], i))

I think this may be preferable to using count(), because it will only go over the iterable once, whereas count may search the entire thing on every iteration. I used this method to parse many megabytes of statistical data and it always was reasonably fast.

answered Aug 17 '10 at 12:25

Stefano Palazzo

4,212
2
29
40

2

Your answer deserves more credit for it's simplicity. I was struggling over this for a while, getting bewildered with the silliness of some of the other users suggesting to import new libraries etc. – ntk4 Sep 23 '16 at 05:56
2

you could simplify it with a default value like this d[key] = d.get(key, 0) + 1 – merhoo Jan 22 '19 at 03:26
The simplicity of this answer is so underrated! Sometimes there is no need to import libraries and over-engineer simple tasks. – Madhavi Jouhari Aug 02 '21 at 11:15

Nick T · Answer 6 · 2010-08-17T21:25:09.740

4

L = ['apple','red','apple','red','red','pear']
d = {}
[d.__setitem__(item,1+d.get(item,0)) for item in L]
print d

Gives {'pear': 1, 'apple': 2, 'red': 3}

edited Aug 17 '10 at 21:25

answered Aug 16 '10 at 19:24

Nick T

25,754
12
83
121

1

Please don't abuse list comprehensions for side effects like this. The imperative loop is much clearer, and does not create a useless temporary list of `None`s. – Karl Knechtel Jul 30 '22 at 21:45

score 1 · Answer 7 · answered Jul 30 '22 at 21:33

If you use Numpy, the unique function can tell you how many times each value appeared by passing return_counts=True:

>>> data = ['apple', 'red', 'apple', 'red', 'red', 'pear']
>>> np.unique(data, return_counts=True)
(array(['apple', 'pear', 'red'], dtype='<U5'), array([2, 1, 3]))

The counts are in the same order as the distinct elements that were found; thus we can use the usual trick to create the desired dictionary (passing the two elements as separate arguments to zip):

>>> dict(zip(*np.unique(data, return_counts=True)))
{'apple': 2, 'pear': 1, 'red': 3}

If you specifically have a large input Numpy array of small integers, you may get better performance from bincount:

>>> data = np.random.randint(10, size=100)
>>> data
array([1, 0, 0, 3, 3, 4, 2, 4, 4, 0, 4, 8, 7, 4, 4, 8, 7, 0, 0, 2, 4, 2,
       0, 9, 0, 2, 7, 0, 7, 7, 5, 6, 6, 8, 4, 2, 7, 6, 0, 3, 6, 3, 0, 4,
       8, 8, 9, 5, 2, 2, 5, 1, 1, 1, 9, 9, 5, 0, 1, 1, 9, 5, 4, 9, 5, 2,
       7, 3, 9, 0, 1, 4, 9, 1, 1, 5, 4, 7, 5, 0, 3, 5, 1, 9, 4, 8, 8, 9,
       7, 7, 7, 5, 6, 3, 2, 4, 3, 9, 6, 0])
>>> np.bincount(data)
array([14, 10,  9,  8, 14, 10,  6, 11,  7, 11])

The nth value in the output array indicates the number of times that n appeared, so we can create the dictionary if desired using enumerate:

>>> dict(enumerate(np.bincount(data)))
{0: 14, 1: 10, 2: 9, 3: 8, 4: 14, 5: 10, 6: 6, 7: 11, 8: 7, 9: 11}

score 0 · Answer 8 · edited Aug 02 '23 at 12:26

0

That is an easy answer m8!

def equalizeArray(arr):
    # Counting the frequency of each element in the array
    freq = {}
    for i in arr:
        if i not in freq:
            freq[i] = 1
        else:
            freq[i] += 1
    # Finding the element with the highest frequency
    max_freq = max(freq.values())
    # Calculating the number of deletions required
    for key,value in freq.items():
        if value == max_freq:
            print(key,"been repeated:",value,"times")

edited Aug 02 '23 at 12:26

toyota Supra

3,181
4
15
19

answered Jul 28 '23 at 18:54

Leo negao

1
2

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 03 '23 at 06:55

Harry Rashid · Answer 9 · 2023-02-07T20:28:49.433

-1

mylist = [1,2,1,5,1,1,6,'a','a','b']
result = {}
for i in mylist:
    result[i] = mylist.count(i)
print(result)

edited Feb 07 '23 at 20:28

answered Feb 07 '23 at 20:25

Harry Rashid

1
1

1

No, not a good idea. Runtime complexity is O(n^2) which pretty much defeats the point of using the dictionary in the first place. Same problem as this answer: https://stackoverflow.com/a/36284223/ – General Grievance Feb 08 '23 at 13:32
[A code-only answer is not high quality](//meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers). While this code may be useful, you can improve it by saying why it works, how it works, when it should be used, and what its limitations are. Please [edit] your answer to include explanation and link to relevant documentation. – Stephen Ostermiller Feb 09 '23 at 10:37

Using a dictionary to count the items in a list

9 Answers9

Linked

Related