2

What is the most efficient way to find consecutively repeating strings in a Python list?

For example, suppose I have the list ["a", "a", "b", "c", "b","b","b"]. I want an output of something like: ["group of 2 a's found at index 0, group of 3 b's found at index 4'].

Is there a built in function to accomplish this task? I did find numpy.bincount, but that seems to only work on numeric values.

Thanks in advance for the help.

Pythontology
  • 1,494
  • 2
  • 12
  • 15
  • possible duplicate of [How to count the frequency of the elements in a list?](http://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list) – Nir Alfasi Aug 22 '14 at 01:55
  • Only consecutive repeated elements? Not just anywhere? – Patrick Collins Aug 22 '14 at 01:57
  • Yes, only consecutive repetitions. **alfasin**, I did see that thread, but the solutions seemed to count all repetitions, not just consecutive ones. – Pythontology Aug 22 '14 at 02:02

2 Answers2

10

It’s funny that you should call it a group, because the function probably best-suited to this is itertools.groupby:

>>> import itertools
>>> items = ["a", "a", "b", "c", "b", "b", "b"]
>>> [(k, sum(1 for _ in vs)) for k, vs in itertools.groupby(items)]
[('a', 2), ('b', 1), ('c', 1), ('b', 3)]

(sum(1 for _ in vs) is a count, by the way, since len doesn’t work on just any iterable, and len(list(…)) is wasteful.)

Getting the index is a little more complicated; I’d just do it using a loop.

import itertools

def group_with_index(l):
    i = 0

    for k, vs in itertools.groupby(l):
        c = sum(1 for _ in vs)
        yield (k, c, i)
        i += c
Ry-
  • 218,210
  • 55
  • 464
  • 476
  • I don't think this gives you only consecutive repetitions, does it? – Patrick Collins Aug 22 '14 at 01:59
  • @PatrickCollins: It does! That’s why you generally pass something sorted to `groupby`. (See the example output, too.) – Ry- Aug 22 '14 at 01:59
  • Is there a quick way to get the indices of the consecutive blocks, as in my post? I suppose that shouldn't be too hard to write myself, but I'd prefer to use the (more efficient) built-in method, if it exists! – Pythontology Aug 22 '14 at 02:03
  • @Pythontology: Hm… I’ve been trying a few things, and there doesn’t seem to be an efficient way that’s simpler than a loop. – Ry- Aug 22 '14 at 02:19
1

This requires state information between elements of the loop so its not easy to do with a list comprehension. Instead you can keep track of last value in a loop:

groups = []
for i, val in enumerate(["a", "a", "b", "c", "b","b","b"]):
    if i == 0:
         cnt = 1
         loc = i
         last_val = val
    elif val == last_val:
         cnt += 1
    else:
         groups.append((cnt, last_val, loc))
         cnt = 1
         loc = i
         last_val = val

for group in groups:
     print("group of {0} {1}'s found at index {2}".format(*group)

Output:

group of 2 a's found at index 0
group of 1 b's found at index 2
group of 1 c's found at index 3
Robb
  • 433
  • 2
  • 9