0

I have a sorted list of numbers:

[ 1,  2,  3,  4,  6,  7,  8,  9, 14, 15, 16, 17, 18, 25, 
 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 45]

I want to know if numpy has a built-in feature to get something like this out of it:

[ [1, 4], [6, 9], [14, 18], [25,36], [38], [45] ]

Also great if it can ignore holes of certain 2-3 numbers missing in between would still make the range.

I am basically listing frame numbers of a video for processing -- so rather than list out all the frame numbers I would jut put in ranges and its okay if a 3-4 frames in between are missing.

Just want to know if there something that already implements this logic as it seems like a common thing people would want to do - otherwise, I'll implement it myself.

Edit: found a very close question that's answered already: converting a list of integers into range in python

DMin
  • 10,049
  • 10
  • 45
  • 65
  • 1
    Similar to https://stackoverflow.com/questions/4628333/converting-a-list-of-integers-into-range-in-python – rassar Feb 18 '22 at 04:19

2 Answers2

5

Since it's tagged as numpy, here is a numpy solution (sort of). There is no native numpy function but you could use diff + where + split + list comprehension:

>>> [[ary[0], ary[-1]] if len(ary)>1 else [ary[0]] for ary in np.split(arr, np.where(np.diff(arr) != 1)[0] + 1)]
[[1, 4], [6, 9], [14, 18], [25, 36], [38], [45]]

If the array is large, it's more efficient to use a loop rather than np.split, so you could use the function below which produces the same outcome as above:

def array_to_range(arr):
    idx = np.r_[0, np.where(np.diff(arr) != 1)[0]+1, len(arr)]
    out = []
    for i,j in zip(idx, idx[1:]):
        ary = arr[i:j]
        if len(ary) > 1:
            out.append([ary[0], ary[-1]])
        else:
            out.append([ary[0]])
    return out
  • It looks like an additional method does pretty much the same as `np.split` does internally. I conclude that exclusion of ending values in return requires to design an algorithm in Pythonic way which is not as performant as to return items of equal length. Python iterations are much slower than vectorised actions in `numpy` – mathfux Feb 18 '22 at 10:26
  • 1
    @mathfux that’s what I thought as well, but when I test it, np.split is significantly slower. For example, for a (100000,) array, it’s 53.9ms vs 152 ms. –  Feb 18 '22 at 11:06
1

numpy is not designed to work with to work with arrays that differs in size. It uses internally list.append to append multiple list to empty array using slicing. It really slows the things down and it's recommended to use only in case user is forced to.

Algorithm of np.split

To better see, how could you improve, let's take a look at general pattern of finding np.split-like indices of array split:

arr = [1, 2, 3, 4, 6, 7, 8, 9, 14, 15, 16, 17, 18, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 45]
div_points = np.flatnonzero(np.diff(arr)!=1) + 1
start_points = np.r_[0, div_points]
end_points = np.r_[div_points, len(arr)]

So now you've got start and end arguments of slices that split an array into multiple sub-arrays:

np.transpose([start_points, end_points])
>>> array([[ 0,  4],
           [ 4,  8],
           [ 8, 13],
           [13, 25],
           [25, 26],
           [26, 27]])

And there is a mechanism that np.split uses internally to split an array:

container = []
for start, end in np.transpose([start_points, end_points]):
    container.append(arr[start:end])

>>> container 
[[1, 2, 3, 4],
 [6, 7, 8, 9],
 [14, 15, 16, 17, 18],
 [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36],
 [38],
 [45]]

Variation of algorithm

To arrive at an output nearly similar to what you expect, you could modify algorithm of np.split like so:

div_points = np.flatnonzero(np.diff(arr)!=1) + 1
start_points = np.r_[0, div_points]
end_points = np.r_[div_points, len(arr)]
out = np.transpose([arr[start_points], arr[end_points - 1]])
out
>>> array([[ 1,  4],
           [ 6,  9],
           [14, 18],
           [25, 36],
           [38, 38],
           [45, 45]])
mathfux
  • 5,759
  • 1
  • 14
  • 34