13

In Python, given a list of sorted integers, I would to group them by consecutive values and tolerate gaps of 1.

For instance, given a list my_list:

In [66]: my_list
Out[66]: [0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]

I would like the following output:

[[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]

Now, if I didn't have to tolerate gaps of 1, I could apply the neat solution explained here:

import itertools
import operator
results = []
for k, g in itertools.groupby(enumerate(my_list), lambda (i,x):i-x):
        group = map(operator.itemgetter(1), g)
        results.append(group)

Is there a way to incorporate my extra requirement in the above solution? If not, what's the best way to tackle the problem?

Community
  • 1
  • 1
Ricky Robinson
  • 21,798
  • 42
  • 129
  • 185

6 Answers6

14

When in doubt you can always write your own generator:

def group_runs(li,tolerance=2):
    out = []
    last = li[0]
    for x in li:
        if x-last > tolerance:
            yield out
            out = []
        out.append(x)
        last = x
    yield out

demo:

list(group_runs(my_list))
Out[48]: [[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]
roippi
  • 25,533
  • 4
  • 48
  • 73
11

Numpy is a very useful tool, and not very difficult to learn.

This problem is solvable in O(n) with a single line of code (excluding imports, data, and converting to list - if you really need it):

from numpy import array, diff, where, split
l= [0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
result= split(l, where(diff(l)>2)[0]+1 )
print map(list, result)

More importantly, the code is very fast if you need to process large lists, unlike a pure-python solution

loopbackbee
  • 21,962
  • 10
  • 62
  • 97
  • Thanks for pointing this out. Before I dig into numpy's tutorial to understand each function you used, I just wanted to tell you that the above code yields the wrong result: `[[0, 1, 2], [3, 5], [6, 10], [11, 15], [16, 18, 19, 20]]` – Ricky Robinson Jan 15 '14 at 16:51
  • 1
    @RickyRobinson You're right, I managed to have two off-by-one errors in one line :) I've corrected it – loopbackbee Jan 15 '14 at 17:03
  • A single line that depends on importing a massive library is fine if you are already using that library but quite likely not appropriate if it is the only line that depends on it. Unless, as you mention – Kevin Whitefoot Jan 21 '14 at 09:02
6

Remember, groupby in itself, is pretty lame. The strength of itertools.groupby is the key. For this particular problem, you need to create an appropriate key with memory (stateless key will not work here).

Implementation

class Key(object):
    def __init__(self, diff):
        self.diff, self.flag, self.prev = diff, [0,1], None
    def __call__(self, elem):
        if self.prev and abs(self.prev - elem) > self.diff:
            self.flag = self.flag[::-1]
        self.prev= elem
        return self.flag[0]

Object

[list(g) for k, g in groupby(my_list, key = Key(2))]
[[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]

How it Works

Every time, a new sub-list needs to be created (curr - prev > threshold), you toggle a flag. There are different ways to toggle a flag

  • flag = 1; flag *= -1
  • flag = [0, 1 ]; flag = flag[::-1]
  • flag = 0; flag = 0 if flag else 1

Choose what ever your heart contends

So this generates an accompanying key along with your list

[0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
[0, 0, 0, 0, 0, 0, 1,  1,  0,  0,  0,  0 , 0]
             <------>  <------>
          Toggle flag  Toggle flag
          0 -> 1, as   1 -> 0, as
          10 - 6 > 2   15 - 11 > 2

Now as itertools.groupby, groups consecutive elements with same key, all elements with keys having consecutive 0s or 1s are grouped together

[0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
[0, 0, 0, 0, 0, 0, 1,  1,  0,  0,  0,  0 , 0]

[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]
[0, 0, 0, 0, 0, 0], [1,  1],  [0,  0,  0,  0 , 0]

And your final result would be

[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • This is absolutely the best solution; unlike the accepted solution, it uses constant memory regardless of how long a "run" is. It does depend on an assumption that groupby only calls `key` once per element, but that's a very fair assumption under any sane implementation. – btown May 04 '16 at 16:28
3

An O(nlogn) solution (assuming the input list isn't sorted) is to first the sort the list you're given, then iterate through each value, creating a new group whenever the difference between the current value and the previous value is at least 3.

Demo

>>> my_list = [0, 1, 2, 3, 5, 6, 10, 11, 15, 16, 18, 19, 20]
>>> my_list.sort() # if we can't assume the list is sorted beforehand
>>> groups = [[my_list[0]]] # initialize with the first value in the list
>>> for i, val in enumerate(my_list[1:]):
...     if val - groups[-1][-1] > 2:
...         groups.append( [val] ) # create a new group
...     else:
...         groups[-1].append( val ) # append to the most recent group
... 
>>> groups
[[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]
mdml
  • 22,442
  • 8
  • 58
  • 66
1

I generally use zip when I want to deal with consecutive elements, and you can use islice you want to avoid building the list slice:

from itertools import islice

def group(lst, tol=1):
    """Group vals in sorted iterable lst, allow tol between consecutive vals."""
    output = [[]]
    for current, next_ in zip(lst, islice(lst, 1, None)):
        output[-1].append(current)
        if next_ > current + tol + 1:
            output.append([])
    output[-1].append(lst[-1])
    return output

Note that in Python 2.x, you need to use itertools.izip to avoid building the list of 2-tuples (current, next_).

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
1

Here's what I came up with. There's a bit of verbose initialization but it gets the job done. =)

output = []
prev = my_list[0]
temp_list = [my_list[0]]

for num in my_list[1:]:
    if num-2 > prev:
        output += [temp_list]
        temp_list = [num]
    else:
        temp_list.append(num)
    prev = num
output.append(temp_list)

print output

# [[0, 1, 2, 3, 5, 6], [10, 11], [15, 16, 18, 19, 20]]
Chris Arena
  • 1,602
  • 15
  • 17