0

With numpy's where() function I'm getting the indices of an array by applying a mask. For example, an index list, where (1, 2, 3, 10, 11, 12, 13) where True:

[1, 2, 3, 10, 11, 12, 13]

Is there a way to transform this (or directly get it from a similar function) in separate ranges, for my example:

[ range(1, 4), range(10, 14)  ]
mathfux
  • 5,759
  • 1
  • 14
  • 34
Patrick B.
  • 11,773
  • 8
  • 58
  • 101
  • I am assuming you are not interested in simply iterating it? – Serial Lazer Nov 10 '20 at 11:16
  • 2
    Does this answer your question? [Identify groups of continuous numbers in a list](https://stackoverflow.com/questions/2154249/identify-groups-of-continuous-numbers-in-a-list) – Flavio Moraes Nov 10 '20 at 11:19
  • Could be. Thanks. – Patrick B. Nov 10 '20 at 11:20
  • @FlavioMoraes It's not quite the same because it doesn't discuss a variant of identical problem on `numpy`. – mathfux Nov 10 '20 at 11:46
  • I believe any algo will encounter ambiguity. Eg, with `[1, 3, 15, 5, 6]`, one possible answer could be `[range(1,4,2), range(15,16,1), range(5,7,1)]`, and another possible answer could be `[range(1,4,2), range(15,4,-10), range(6,7,1)]`. Are you ok with either of these answers? – fountainhead Nov 10 '20 at 11:55
  • Are the numbers guaranteed to be in non-decreasing order? – fountainhead Nov 10 '20 at 11:57
  • With `[1, 3, 15, 20, 21]`, a "greedy" algo will always (unambiguously) produce an answer of `[range(1,4,2), range(15,21,5), and range(21,22,1)]`. After identifying `range(1,4,2)`, when it sees `15`, it would "greedily" treat `15, 20` to be part of one new range (rather than let go of `20` to become a part of the next range). Would such a greedy approach be ok? – fountainhead Nov 10 '20 at 12:08
  • No, I need only ranges with step=1. No greed necessary. – Patrick B. Nov 10 '20 at 13:52

2 Answers2

2

Assuming all the ranges include unit differences only, it's worth to start with a search of indexes where to split:

split_idx = np.flatnonzero(np.diff(ls, prepend = ls[0], append=ls[-1])!=1)

Note that it is improved slightly in order to insert starting and ending indexes as well. A further work could be done like so:

bounding_points = np.transpose([split_idx[:-1], split_idx[1:]])
out = [range(ls[n], ls[m-1]+1) for n, m in bounding_points.tolist()]

Sample run:

>>> ls = [1, 2, 3, 10, 11, 12, 13, 16, 18, 19]
>>> split_idx
array([ 0,  3,  7,  8, 10], dtype=int32)
>>> bounding_points
array([[ 0,  3],
       [ 3,  7],
       [ 7,  8],
       [ 8, 10]], dtype=int32)
>>> out
[range(1, 4), range(10, 14), range(16, 17), range(18, 20)]
mathfux
  • 5,759
  • 1
  • 14
  • 34
  • I believe you may be able to get the split points also by passing `n=2` to `np.diff()`, and comparing result of `np.diff()` to `0` rather than `1`. That would probably make the solution work even if the range steps are greater than 1. I would try it out myself, but my numpy is old, and doesn't even accept the `prepend` arg !! – fountainhead Nov 10 '20 at 12:39
  • @fountainhead `prepend` and `append` arguments are actually good for a demonstration purpose only. They are not efficient. You could do it better using `np.r_[item1, arr1, arr2, ..., arrn, item2]`. And you can't pass `n=2` inside [`np.diff`](https://numpy.org/doc/stable/reference/generated/numpy.diff.html) because it has a different meaning (read the docs). – mathfux Nov 10 '20 at 12:53
  • Ok, I'll check again. I thought passing `n=2` would produce second order differences (differences of differences). (Hence that suggestion) – fountainhead Nov 10 '20 at 12:54
  • @fountainhead I like extensible functions in answers, but I think extending to sequences with `step > 1` is adding a bit too much complexity based on the question – Daniel F Nov 10 '20 at 12:58
  • @DanielF -- I believe right now the assumption of step = 1 is based on OP's *example* rather than explicit statement of fact. Anyways, I'll leave this here. Thanks. – fountainhead Nov 10 '20 at 13:01
1

reading the link I sent you I got this answer:

from operator import itemgetter
from itertools import groupby

data = [1, 2, 3, 10, 11, 12, 13]
ranges = []
for key, group in groupby(enumerate(data), lambda x: x[0] - x[1]):
    group = list(map(itemgetter(1), group))
    if len(group) > 1:
        ranges.append(range(group[0], group[-1]+1))
    else:
        ranges.append(group[0])

print(ranges)

the output was

[range(1, 4), range(10, 14)]
Flavio Moraes
  • 1,331
  • 1
  • 5
  • 16