I started writing this before I saw that your intervals cannot overlap. This approach is a bit of overkill, but I'll leave it up since it seems wasteful to throw it away.
See bottom for short solution.
OOP-ish way to do things:
class Interval:
def __init__(self,left,right):
self.left = int(left)
self.right = int(right)
def __contains__(self,x):
return self.left <= int(x) <= self.right
intervals = [['HE670029', '4095', '4096'],
['HE670029', '4098', '4099'],
['HE670029', '4102', '4102']]
#if intervals aren't sorted, then do:
#cuts = [Interval(*x[1:]) for x in sorted(intervals,key=lambda i: i[1])]
cuts = [Interval(*x[1:]) for x in intervals]
#this step is overkill, since we know our intervals can't overlap
breakpoints = [x for x in range(1,5000) if any(x in cut for cut in cuts)]
def gen_segments(breakpoints, id_='HE670029', start=0, end=5000 ):
for pair in chunks(breakpoints,2):
if len(pair) < 2: #last breakpoint may be singleton
pair += pair
left,right = pair
yield id_, start, left-1
start = right+1
yield id_, start, end
chunks
being one of several chunk recipes like found on this page. demo:
list(gen_segments(breakpoints))
Out[258]:
[('HE670029', 0, 4094),
('HE670029', 4097, 4097),
('HE670029', 4100, 4101),
('HE670029', 4103, 5000)]
Like I said, the above is massively overkill. If you know your intervals don't overlap, you don't need a fancy Interval
class or anything. Just do this:
breakpoints = [int(x) for interval in intervals for x in interval[1:]]
And then proceed directly with gen_segments
, above.