I'm faced with a problem of finding discontinuities (gaps) of a given length in a sequence of numbers. So, for example, given [1,2,3,7,8,9,10]
and a gap of length=3
, I'll find [4,5,6]
. If the gap is length=4
, I'll find nothing. The real sequence is, of course, much longer. I've seen this problem in quite a few posts, and it had various applications and possible implementations.
One way I thought might work and should be relatively quick is to represent the complete set as a bit array containing 1 for available number and 0 for missing - so the above will look like [1,1,1,0,0,0,1,1,1,1]
. Then possibly run a window function that'll XOR mask an array of the given length with the complete set until all locations result in 1. This will require a single pass over the whole sequence in roughly ~O(n), plus the cost of masking in each run.
Here's what I managed to come up with:
def find_gap(array, start=0, length=10):
"""
array: assumed to be of length MAX_NUMBER and contain 0 or 1
if the value is actually present
start: indicates what value to start looking from
length: what the length the gap should be
"""
# create the bitmask to check against
mask = ''.join( [1] * length )
# convert the input 0/1 mapping to bit string
# e.g - [1,0,1,0] -> '1010'
bits =''.join( [ str(val) for val in array ] )
for i in xrange(start, len(bits) - length):
# find where the next gap begins
if bits[i] != '0': continue
# gap was found, extract segment of size 'length', compare w/ mask
if (i + length < len(bits)):
segment = bits[i:i+length]
# use XOR between binary masks
result = bin( int(mask, 2) ^ int(segment, 2) )
# if mask == result in base 2, gap found
if result == ("0b%s" % mask): return i
# if we got here, no gap exists
return -1
This is fairly quick for ~100k (< 1 sec). I'd appreciate tips on how to make this faster / more efficient for larger sets. thanks!