41

I've got a list of integers and I want to be able to identify contiguous blocks of duplicates: that is, I want to produce an order-preserving list of duples where each duples contains (int_in_question, number of occurrences).

For example, if I have a list like:

[0, 0, 0, 3, 3, 2, 5, 2, 6, 6]

I want the result to be:

[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]

I have a fairly simple way of doing this with a for-loop, a temp, and a counter:

result_list = []
current = source_list[0]
count = 0
for value in source_list:
    if value == current:
        count += 1
    else:
        result_list.append((current, count))
        current = value
        count = 1
result_list.append((current, count))

But I really like python's functional programming idioms, and I'd like to be able to do this with a simple generator expression. However I find it difficult to keep sub-counts when working with generators. I have a feeling a two-step process might get me there, but for now I'm stumped.

Is there a particularly elegant/pythonic way to do this, especially with generators?

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
machine yearning
  • 9,889
  • 5
  • 38
  • 51

1 Answers1

79
>>> from itertools import groupby
>>> L = [0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
>>> grouped_L = [(k, sum(1 for i in g)) for k,g in groupby(L)]
>>> # Or (k, len(list(g))), but that creates an intermediate list
>>> grouped_L
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]

Batteries included, as they say.

Suggestion for using sum and generator expression from JBernardo; see comment.

jscs
  • 63,694
  • 13
  • 151
  • 195
  • 10
    +1, maybe you could change `len(list(g))` for `sum(1 for i in g)` to avoid intermediate storage. – JBernardo Jun 15 '11 at 03:03
  • 2
    @JBernardo: Good suggestion, thanks. Creating a list from `g` has always kind of bothered me when I use `groupby` for this. – jscs Jun 15 '11 at 03:06
  • @JBernardo: Actually I'm gonna go with creating the intermediate list. Although perhaps doing the sum would be more efficient, I think the former is far more readable (really states exactly what we want to happen) and thus more pythonic! I do think that this "adding ones" solution is hinting at something lacking in generators, specifically that there's no way of explicitly, with a built-in function, telling how many elements will be generated. Might this be amended in the future? – machine yearning Jun 15 '11 at 06:55
  • 2
    @machine: It's in principle impossible. Consider: `def long_gen(): while True: yield 1` What is the `len` of this? See: http://stackoverflow.com/questions/390852/is-there-any-built-in-way-to-get-the-length-of-an-iterable-in-python – jscs Jun 15 '11 at 06:56
  • @Josh: I see now, if it's a well-established idiom then I take back my previous comment on pythonicity. I suppose the halting problem could throw a wrench into any general-case amendment one might make to this, too. Thanks for the thoughtful response! – machine yearning Jun 15 '11 at 07:07
  • 1
    @machine: You're welcome. I've seen this use of `sum` in other places but hadn't thought to use it in this case. I think it would be quickly understood by most readers. – jscs Jun 15 '11 at 07:14