Find number runs with customizable distance between numbers

Question

I have a sorted list of integers and I find want find to the number runs in this list. I have seen many examples when looking for numbers runs that increment by 1, but I also want to look for number runs where the difference between numbers is customizable.

For example, say I have the following list of numbers:

nums = [1, 2, 3, 6, 7, 8, 10, 12, 14, 18, 25, 28, 31, 39]

Using the example found here, I'm able to find the following number runs:

[[1, 2, 3], [6, 7, 8], [10], [12], [14], [18], [25], [28], [31], [39]]

However, I want to look for number runs where the difference between two numbers could be more than just 1. For example, I want all number runs with a distance of less than or equal to 2.

[[1, 2, 3], [6, 7, 8, 10, 12, 14], [18], [25], [28], [31], [39]]

Or maybe I want all number runs with a distance of less than or equal to 3.

[[1, 2, 3, 6, 7, 8, 10, 12, 14], [18], [25, 28, 31], [39]]

Here is the function that I'm working with now to get number runs with a distance of 1.

def runs(seq, n):
    result = []
    for s in seq:
        if not result or s != result[-1][-1] + n:
            # Start a new run if we can't continue the previous one.
            result.append([])
        result[-1].append(s)
    return result

With the current function, if I set n=1, then I find all consecutive number sequences. If I set n=2, then I only find [8, 10, 12, 14]. How can I modify this function to find number runs that are less than or equal to n?

I want to be able to do this:

runs(num, 2)
[[1, 2, 3], [6, 7, 8, 10, 12, 14], [18], [25], [28], [31], [39]]

Isn't it just `s > result[-1][-1] + n` instead of `s != result[-1][-1] + n`? — John Zwinck, Jun 10 '18 at 03:30
Just wanted to point out that this is a really well written, high quality question, well done! — user3483203, Jun 10 '18 at 03:59
I believe your example for `<=3` is wrong as in `[..., [25, 28], [31], ...]`, 31 is only a distance of 3 away from 28. — AGN Gazer, Jun 10 '18 at 04:17

cs95 · Accepted Answer · 2018-06-10T04:18:40.750

Iterative grouping with `for`

I'm just fixing your code with this one. To simplify things, you can initialise result with seq[0].

def runs(seq, n):
    result = [[seq[0]]]
    for s in seq[1:]:
        if s - result[-1][-1] > n:  # Keep it simple. Compare the delta.
            result.append([])
        result[-1].append(s)

    return result

>>> runs(nums, 1)
[[1, 2, 3], [6, 7, 8], [10], [12], [14], [18], [25], [28], [31], [39]]
>>> runs(nums, 2)
[[1, 2, 3], [6, 7, 8, 10, 12, 14], [18], [25], [28], [31], [39]]

`pandas.GroupBy`

If you want to get fancy, you can use the groupby idiom, made easy with pandas.

import pandas as pd

def runs2(seq, n):
    s = pd.Series(seq)
    return s.groupby(s.diff().gt(n).cumsum()).apply(pd.Series.tolist).tolist()

>>> runs2(nums, 3)
[[1, 2, 3, 6, 7, 8, 10, 12, 14], [18], [25, 28, 31], [39]]

There are two essential ingredients here: the grouper (the predicate you're grouping on), and the agg function (the function you'll apply on each group)

The grouper is s.diff().gt(n).cumsum(), broken down calculates three things:

The element-wise difference in seq using diff
A boolean mask indicating whether the diff is greater than n
Performing a cumulative sum (or count) to determine the groups

The output of this operation is

s.diff().gt(n).cumsum()

0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     1
8     1
9     2
10    3
11    4
12    5
13    6
dtype: int64

The agg function is pd.Series.tolist, and will convert any series to a list. That's what we need here, a nested list.

AGN Gazer · Answer 2 · 2018-06-10T04:22:28.157

2

def runs(nums, n):
    idx = np.flatnonzero(np.ediff1d(nums, n + 1, n + 1) > n)
    return [nums[i1:i2] for i1, i2 in zip(idx[:-1], idx[1:])]

Then,

>>> runs(nums, 3)
[[1, 2, 3, 6, 7, 8, 10, 12, 14], [18], [25, 28, 31], [39]]

edited Jun 10 '18 at 04:22

answered Jun 10 '18 at 04:12

AGN Gazer

8,025
2
27
45

score 1 · Answer 3 · answered Jun 10 '18 at 04:38

In [4]: def runs(seq, n):
   ...:     indexs = [i for i in range(len(seq)) if i==0 or seq[i]-seq[i-1]>n]
   ...:     return [seq[a:b] for a, b in zip(indexs, indexs[1:]+[len(seq)])]
   ...: 
   ...: 

In [5]: runs(nums, 3)
Out[5]: [[1, 2, 3, 6, 7, 8, 10, 12, 14], [18], [25, 28, 31], [39]]

In [6]: runs(nums, 2)
Out[6]: [[1, 2, 3], [6, 7, 8, 10, 12, 14], [18], [25], [28], [31], [39]]

Find number runs with customizable distance between numbers

3 Answers3

Iterative grouping with for

pandas.GroupBy

Iterative grouping with `for`

`pandas.GroupBy`