python histogram one-liner

Question

There are many ways to write a Python program that computes a histogram.

By histogram, I mean a function that counts the occurrence of objects in an iterable and outputs the counts in a dictionary. For example:

>>> L = 'abracadabra'
>>> histogram(L)
{'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2}

One way to write this function is:

def histogram(L):
    d = {}
    for x in L:
        if x in d:
            d[x] += 1
        else:
            d[x] = 1
    return d

Are there more concise ways of writing this function?

If we had dictionary comprehensions in Python, we could write:

>>> { x: L.count(x) for x in set(L) }

but since Python 2.6 doesn't have them, we have to write:

>>> dict([(x, L.count(x)) for x in set(L)])

Although this approach may be readable, it is not efficient: L is walked-through multiple times. Furthermore, this won't work for single-life generators; the function should work equally well for iterator generators such as:

def gen(L):
    for x in L:
        yield x

We might try to use the reduce function (R.I.P.):

>>> reduce(lambda d,x: dict(d, x=d.get(x,0)+1), L, {}) # wrong!

Oops, this does not work: the key name is 'x', not x. :(

I ended with:

>>> reduce(lambda d,x: dict(d.items() + [(x, d.get(x, 0)+1)]), L, {})

(In Python 3, we would have to write list(d.items()) instead of d.items(), but it's hypothethical, since there is no reduce there.)

Please beat me with a better, more readable one-liner! ;)

"one liner" and "more readable" aren't mutually exclusive, but they're close — msw, May 20 '10 at 01:26
Not an answer, just some comments: First, dict((x, L.count(x)) for x in set(L)) works perfectly well (at least in 2.6 or so, possibly earlier versions too), so there's no need to introduce the extra list in your example above. Secondly, if you don't care about one-liners then this is a job tailor-made for defaultdict from the collections module. Replace d = {} with d = collections.defaultdict(int) in your original histogram function, and then you can skip the if x in d: bit. — Peter Milley, May 20 '10 at 01:30
Peter Milley: yor almost dict comprehension works even in Python 2.5.2! thanks, i was not aware of this syntax — mykhal, May 20 '10 at 01:38

score 78 · Accepted Answer · answered May 20 '10 at 01:33

78

Python 3.x does have reduce, you just have to do a from functools import reduce. It also has "dict comprehensions", which have exactly the syntax in your example.

Python 2.7 and 3.x also have a Counter class which does exactly what you want:

from collections import Counter
cnt = Counter("abracadabra")

In Python 2.6 or earlier, I'd personally use a defaultdict and do it in 2 lines:

d = defaultdict(int)
for x in xs: d[x] += 1

That's clean, efficient, Pythonic, and much easier for most people to understand than anything involving reduce.

answered May 20 '10 at 01:33

Eli Courtwright

186,300
67
213
256

6

Python 2.7 also has dict comprehensions. – Grant Paul May 20 '10 at 02:30
What is the variable xs in "for x in xs" ? – RobyB Nov 19 '22 at 21:59
1

@RobyB: The "xs" variable in this example is a list. In many code examples, "x" is a single item and "xs" is a list or array - this convention comes originally from the Haskell programming language. – Eli Courtwright Nov 28 '22 at 23:56

John La Rooy · Answer 2 · 2010-08-18T04:52:15.800

8

It's kinda cheaty to import modules for oneliners, so here's a oneliner that is O(n) and works at least as far back as Python2.4

>>> f=lambda s,d={}:([d.__setitem__(i,d.get(i,0)+1) for i in s],d)[-1]
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}

And if you think __ methods are hacky, you can always do this

>>> f=lambda s,d=lambda:0:vars(([setattr(d,i,getattr(d,i,0)+1) for i in s],d)[-1])
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}

:)

edited Aug 18 '10 at 04:52

answered Aug 18 '10 at 04:47

John La Rooy

295,403
53
369
502

1

Cool indeed, but I have to agree on @msw 's comment on readability. If I'd see someone push this to our repro I would have a serious discussion with him... – RickyA Jan 24 '13 at 13:01

score 6 · Answer 3 · answered Feb 20 '15 at 15:18

6

import pandas as pd

pd.Series(list(L)).value_counts()

answered Feb 20 '15 at 15:18

mirandes

957
10
10

score 6 · Answer 4 · edited May 30 '13 at 03:31

6

$d{$_} += 1 for split //, 'abracadabra';

edited May 30 '13 at 03:31

Sean Vieira

155,703
32
311
293

answered Nov 18 '10 at 02:33

perl

85
1
1

3

@perl I think you should take this novelty account further – BC. May 16 '13 at 19:50
Why did an answer written in an unrelated programming language with no explanation get 9 upvotes? – Karl Knechtel Aug 01 '22 at 21:13

score 5 · Answer 5 · answered Aug 15 '13 at 21:44

5

For python 2.7, you can use this small list comprehension:

v = list('abracadabra')
print {x: v.count(x) for x in set(v)}

answered Aug 15 '13 at 21:44

Walter Cacau

176
2
2

I find this to be the most elegant solution. Nice! – Ohumeronen Jul 07 '16 at 13:22

score 4 · Answer 6 · answered Dec 13 '12 at 19:04

4

One that works back to 2.3 (slightly shorter than Timmerman's, I think more readable) :

L = 'abracadabra'
hist = {}
for x in L: hist[x] = hist.pop(x,0) + 1
print hist
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}

answered Dec 13 '12 at 19:04

dgulino

41
1

PaulMcG · Answer 7 · 2010-05-20T15:58:53.870

1

For a while there, anything using itertools was by definition Pythonic. Still, this is a bit on the opaque side:

>>> from itertools import groupby
>>> grouplen = lambda grp : sum(1 for i in grp)
>>> hist = dict((a[0], grouplen(a[1])) for a in groupby(sorted("ABRACADABRA")))
>>> print hist
{'A': 5, 'R': 2, 'C': 1, 'B': 2, 'D': 1}

I'm currently running Python 2.5.4.

edited May 20 '10 at 15:58

answered May 20 '10 at 02:21

PaulMcG

62,419
16
94
130

3

This solution is O(n log n). There are several simpler linear solutions provided here. – Mike Graham May 20 '10 at 03:27
@Mike - are you sure? Beware of lurking complexities. Iterating over the list is obviously O(n), but what is the complexity of the repeated looking up of each key in the summarizing dict? It's not O(1). – PaulMcG May 20 '10 at 12:32
2

Looking up dict keys is O(1). – Mike Graham May 20 '10 at 20:03
This solution (without the sorted call, of course) is ok when your iterable is already sorted, otherwise it's too expensive, as Mike stated. – tokland Sep 08 '10 at 13:42

tokland · Answer 8 · 2011-03-26T21:39:44.073

1

Your one-liner using reduce was almost ok, you only needed to tweak it a little bit:

>>> reduce(lambda d, x: dict(d, **{x: d.get(x, 0) + 1}), L, {})
{'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2}

Of course, this won't beat in-place solutions (nor in speed, nor in pythonicity), but in exchange you've got yourself a nice purely functional snippet. BTW, this would be somewhat prettier if Python had a method dict.merge().

edited Mar 26 '11 at 21:39

answered Sep 06 '10 at 14:58

tokland

66,169
13
144
170

tokland, isn't `dict.update()` the same as what you mean by `dict.merge()` – sblom Jun 15 '11 at 04:43
@sblom: you've kill a functional cat ;-) dict.update() works in-place while dict.merge() wouldn't (check Ruby's Hash#merge, Hash#update). Even if we didn't care for purity, as dict.update() does not return the updated dict, it couldn't be used in a one-liner lambdas. – tokland Jun 15 '11 at 07:33

Jens Timmerman · Answer 9 · 2012-02-21T16:41:52.123

I needed a histogram implementation to work in python 2.2 up to 2.7, and came up with this:

>>> L = 'abracadabra'
>>> hist = {}
>>> for x in L: hist[x] = hist.setdefault(x,0)+1
>>> print hist
{'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}

I was inspired by Eli Courtwright's post of a defaultdict. These were introduced in python 2.5 so can't be used. But they can be emulated with the dict.setdefault(key,default).

This is basically the same thing gnibbler is doing, but I had to write this first before I could completely understand his lambda function.

python histogram one-liner

9 Answers9

Linked