How to group a run of items in python

Question

Say I want to group together ints which are separated by less than a certain threshold. My concrete use case is identifying the largest chunks of uncovered code in test coverage results, e.g.:

groupruns('53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158', 3)
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]

score 1 · Answer 1 · answered Mar 12 '14 at 21:40

You could use Raymond Hettinger's cluster function:

def cluster(data, maxgap, key=None):
    """Arrange data into groups where successive elements
       differ by no more than *maxgap*

        >>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
        [[1, 6, 9], [100, 102, 105, 109], [134, 139]]

        >>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
        [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]

    http://stackoverflow.com/a/14783998/190597 (Raymond Hettinger)
    """
    data.sort()
    groups = [[data[0]]]
    for item in data[1:]:
        if key:
            val = key(item, groups[-1])
        else:
            val = abs(item - groups[-1][-1])
        if val <= maxgap:
            groups[-1].append(item)
        else:
            groups.append([item])
    return groups

data = [53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158]
print(cluster(data, maxgap=3))

yields

[[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]

score 0 · Answer 2 · answered Mar 12 '14 at 21:37

itertools.groupby can help us here. The only difficulty is that groupby groups by a "key" which needs to be calculated for each item, such that each group is made of consecutive items with the same key. This means that our keyfunc object needs to save state to perform this task:

class runner(object):
     def __init__(self, threshold=1):
         self.threshold = threshold
         self.last = None
         self.key = None
     def __call__(self,item):
         if self.last is None:
             self.last = item
             self.key = item
             return item
         if item - self.last <= self.threshold:
             self.last = item
             return self.key
         else:
             self.last = item
             self.key = item
             return item

The basic idea is that if we're within the threshold of the last item, we return the current key, which is the first item of the current run.

Let's see that in action:

[list(g) for k,g in itertools.groupby((int(s) for s in '53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158'.split(',')), runner(3))]
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]

score 0 · Answer 3 · answered Mar 24 '14 at 17:06

How about this without using any module:

NB: assuming they are already sorted.

#!/usr/bin/python

group = (53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311,
         317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787,
         792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148,
         1158)

def group_runs(group, step):
    mark = [0]
    diff = map(lambda x: (x[1] - x[0]), zip(group[:],group[1:]))
    [mark.append(i+1) for i,j in enumerate(diff) if j > step]
    return [list(group[x[0]:x[1]]) for x in zip(mark[::], mark[1::])]

print group_runs(group, 3)

Output:

 [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323,
   325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870],
  [875], [884], [947], [993], [1134], [1139], [1148]]

This is not easily read. Readability counts. You might like to simplify AND explain this code. Also side effects in list comprehensions are discouraged unless there's a specific reason for it. — Marcin, Mar 24 '14 at 20:54

How to group a run of items in python

3 Answers3