Finding consecutively repeating strings in Python list

Question

What is the most efficient way to find consecutively repeating strings in a Python list?

For example, suppose I have the list ["a", "a", "b", "c", "b","b","b"]. I want an output of something like: ["group of 2 a's found at index 0, group of 3 b's found at index 4'].

Is there a built in function to accomplish this task? I did find numpy.bincount, but that seems to only work on numeric values.

Thanks in advance for the help.

possible duplicate of [How to count the frequency of the elements in a list?](http://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list) — Nir Alfasi, Aug 22 '14 at 01:55
Yes, only consecutive repetitions. **alfasin**, I did see that thread, but the solutions seemed to count all repetitions, not just consecutive ones. — Pythontology, Aug 22 '14 at 02:02

Ry- · Accepted Answer · 2014-08-22T02:22:14.153

10

It’s funny that you should call it a group, because the function probably best-suited to this is itertools.groupby:

>>> import itertools
>>> items = ["a", "a", "b", "c", "b", "b", "b"]
>>> [(k, sum(1 for _ in vs)) for k, vs in itertools.groupby(items)]
[('a', 2), ('b', 1), ('c', 1), ('b', 3)]

(sum(1 for _ in vs) is a count, by the way, since len doesn’t work on just any iterable, and len(list(…)) is wasteful.)

Getting the index is a little more complicated; I’d just do it using a loop.

import itertools

def group_with_index(l):
    i = 0

    for k, vs in itertools.groupby(l):
        c = sum(1 for _ in vs)
        yield (k, c, i)
        i += c

edited Aug 22 '14 at 02:22

answered Aug 22 '14 at 01:58

Ry-

218,210
55
464
476

I don't think this gives you only consecutive repetitions, does it? – Patrick Collins Aug 22 '14 at 01:59
@PatrickCollins: It does! That’s why you generally pass something sorted to `groupby`. (See the example output, too.) – Ry- Aug 22 '14 at 01:59
Is there a quick way to get the indices of the consecutive blocks, as in my post? I suppose that shouldn't be too hard to write myself, but I'd prefer to use the (more efficient) built-in method, if it exists! – Pythontology Aug 22 '14 at 02:03
@Pythontology: Hm… I’ve been trying a few things, and there doesn’t seem to be an efficient way that’s simpler than a loop. – Ry- Aug 22 '14 at 02:19

score 1 · Answer 2 · answered Aug 22 '14 at 02:17

This requires state information between elements of the loop so its not easy to do with a list comprehension. Instead you can keep track of last value in a loop:

groups = []
for i, val in enumerate(["a", "a", "b", "c", "b","b","b"]):
    if i == 0:
         cnt = 1
         loc = i
         last_val = val
    elif val == last_val:
         cnt += 1
    else:
         groups.append((cnt, last_val, loc))
         cnt = 1
         loc = i
         last_val = val

for group in groups:
     print("group of {0} {1}'s found at index {2}".format(*group)

Output:

group of 2 a's found at index 0
group of 1 b's found at index 2
group of 1 c's found at index 3

Finding consecutively repeating strings in Python list

2 Answers2

Linked