Any easy way to transform a missing number sequence to its range?

Question

Suppose I have a list that goes like : ''' [1,2,3,4,9,10,11,20] ''' I need the result to be like : ''' [[4,9],[11,20]] ''' I have defined a function that goes like this :

def get_range(lst):
i=0
seqrange=[]
for new in lst:
    a=[]
    start=new
    end=new
    if i==0:
        i=1
        old=new
    else:
        if new - old >1:
            a.append(old)
            a.append(new)
    old=new
    if len(a):
        seqrange.append(a)
return seqrange

Is there any other easier and efficient way to do it? I need to do this in the range of millions.

Pranav Hosangadi · Accepted Answer · 2020-10-29T15:01:29.540

You can use numpy arrays and the diff function that comes along with them. Numpy is so much more efficient than looping when you have millions of rows.

Slight aside: Why are numpy arrays so fast? Because they are arrays of data instead of arrays of pointers to data (which is what Python lists are), because they offload a whole bunch of computations to a backend written in C, and because they leverage the SIMD paradigm to run a Single Instruction on Multiple Data simultaneously.

Now back to the problem at hand:

The diff function gives us the difference between consecutive elements of the array. Pretty convenient, given that we need to find where this difference is greater than a known threshold!

import numpy as np

threshold = 1
arr = np.array([1,2,3,4,9,10,11,20])

deltas = np.diff(arr)
# There's a gap wherever the delta is greater than our threshold
gaps = deltas > threshold 
gap_indices = np.argwhere(gaps)

gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1] 

# Finally, stack the two arrays horizontally
all_gaps = np.hstack((gap_starts, gap_ends))
print(all_gaps)
# Output: 
# [[ 4  9]
#  [11 20]]

You can access all_gaps like a 2D matrix: all_gaps[0, 1] would give you 9, for example. If you really need the answer as a list-of-lists, simply convert it like so:

all_gaps_list = all_gaps.tolist()
print(all_gaps_list)
# Output: [[4, 9], [11, 20]]

Comparing the runtime of the iterative method from @happydave's answer with the numpy method:

import random
import timeit

import numpy

def gaps1(arr, threshold):
    deltas = np.diff(arr)
    gaps = deltas > threshold 
    gap_indices = np.argwhere(gaps)
    gap_starts = arr[gap_indices]
    gap_ends = arr[gap_indices + 1] 
    all_gaps = np.hstack((gap_starts, gap_ends))
    return all_gaps

def gaps2(lst, thr):
    seqrange = []
    for i in range(len(lst)-1):
      if lst[i+1] - lst[i] > thr:
        seqrange.append([lst[i], lst[i+1]])
    return seqrange

test_list = [i for i in range(100000)]
for i in range(100):
    test_list.remove(random.randint(0, len(test_list) - 1))

test_arr = np.array(test_list)

# Make sure both give the same answer:
assert np.all(gaps1(test_arr, 1) == gaps2(test_list, 1))

t1 = timeit.timeit('gaps1(test_arr, 1)', setup='from __main__ import gaps1, test_arr', number=100)
t2 = timeit.timeit('gaps2(test_list, 1)', setup='from __main__ import gaps2, test_list', number=100)

print(f"t1 = {t1}s; t2 = {t2}s; Numpy gives ~{t2 // t1}x speedup")

On my laptop, this gives:

t1 = 0.020834800001466647s; t2 = 1.2446780000027502s; Numpy gives ~59.0x speedup

My word that's fast!

score 1 · Answer 2 · answered Oct 07 '20 at 15:47

1

There is iterator based solution. It'is allow to get intervals one by one:

flist =  [1,2,3,4,9,10,11,20]


def get_range(lst):
    start_idx = lst[0]
    for current_idx in flist[1:]:
        if current_idx > start_idx+1:
            yield [start_idx, current_idx]
        start_idx = current_idx


for inverval in get_range(flist):
    print(inverval)

answered Oct 07 '20 at 15:47

Stanislav Ivanov

1,854
1
16
22

молодец Stanislav – Eduards Oct 07 '20 at 15:51

score 0 · Answer 3 · answered Oct 07 '20 at 15:46

0

I don't think there's anything inefficient about the solution, but you can clean up the code quite a bit:

seqrange = []
for i in range(len(lst)-1):
  if lst[i+1] - lst[i] > 1:
    seqrange.append([lst[i], lst[i+1]])

answered Oct 07 '20 at 15:46

happydave

7,127
1
26
25

Thank you so much. Don't know the feasibility when using millions of numbers though, will update upon testing. – SidMenon Oct 07 '20 at 16:42

score 0 · Answer 4 · answered Oct 07 '20 at 16:32

I think this could be more efficient and a bit cleaner.

def func(lst):
    ans=0
    final=[]
    sol=[]
    for i in range(1,lst[-1]+1):
        if(i not in lst):
            ans+=1
            final.append(i)
        elif(i in lst and ans>0):
            final=[final[0]-1,i]
            sol.append(final)
            ans=0
            final=[]
        else:
            final=[]
    return(sol)

Any easy way to transform a missing number sequence to its range?

4 Answers4