1

Lets say I have a rope of length 0-5000. I want to divide this rope up so that the intervals in the lists shown bellow are cut out and the remainder is returned:

My lists:

['HE670029', '4095', '4096']
['HE670029', '4098', '4099']
['HE670029', '4102', '4102']

Desired output (does not have to be a list, can be written to file each list on a new line):

['HE670029', '0', '4094']
['HE670029', '4097', '4097']
['HE670029', '4100', '4101']
['HE670029', '4103', '5000']

I tried manipulating dictionaries, but without success. I don't know how to get this into a format that would allow me to perform the required operation.

crysis405
  • 1,121
  • 2
  • 13
  • 26
  • Is your input guaranteed to be sorted? And can the intervals overlap? – user2357112 Mar 22 '14 at 00:24
  • The intervals cannot overlap (they are all distinct point/intervals along the rope), and most likely the input will be sorted, but it would be better if it did not have to be. – crysis405 Mar 22 '14 at 00:41

3 Answers3

1

It ain't pretty, but it works:

sections_to_cut = [
        ['HE670029', '4095', '4096'],
        ['HE670029', '4098', '4099'],
        ['HE670029', '4102', '4102']
    ]

ropes = {}
for rope in sections_to_cut:
    if rope[0] not in ropes: # could use default dict instead
        ropes[rope[0]] = []
    ropes[rope[0]].append((int(rope[1]), int(rope[2])))

cut_ropes = []

for rope_name, exclude_values in ropes.items():
    sorted_ex = sorted(exclude_values, key=lambda x: x[0])
    a = 0
    for i in sorted_ex:
        cut_ropes.append([rope_name, str(a), str(i[0]-1)])
        a = i[1] + 1
    cut_ropes.append([rope_name, str(a), str(5000)])

print(cut_ropes)
# [['HE670029', '0', '4094'], ['HE670029', '4097', '4097'], ['HE670029', '4100', '4101'], ['HE670029', '4103', '5000']]
CasualDemon
  • 5,790
  • 2
  • 21
  • 39
0

I started writing this before I saw that your intervals cannot overlap. This approach is a bit of overkill, but I'll leave it up since it seems wasteful to throw it away.

See bottom for short solution.

OOP-ish way to do things:

class Interval:
    def __init__(self,left,right):
        self.left = int(left)
        self.right = int(right)
    def __contains__(self,x):
        return self.left <= int(x) <= self.right

intervals = [['HE670029', '4095', '4096'],
['HE670029', '4098', '4099'],
['HE670029', '4102', '4102']]

#if intervals aren't sorted, then do:
#cuts = [Interval(*x[1:]) for x in sorted(intervals,key=lambda i: i[1])]
cuts = [Interval(*x[1:]) for x in intervals]

#this step is overkill, since we know our intervals can't overlap
breakpoints = [x for x in range(1,5000) if any(x in cut for cut in cuts)]

def gen_segments(breakpoints, id_='HE670029', start=0, end=5000 ):
    for pair in chunks(breakpoints,2):
        if len(pair) < 2: #last breakpoint may be singleton
            pair += pair
        left,right = pair
        yield id_, start, left-1
        start = right+1
    yield id_, start, end

chunks being one of several chunk recipes like found on this page. demo:

list(gen_segments(breakpoints))
Out[258]: 
[('HE670029', 0, 4094),
 ('HE670029', 4097, 4097),
 ('HE670029', 4100, 4101),
 ('HE670029', 4103, 5000)]

Like I said, the above is massively overkill. If you know your intervals don't overlap, you don't need a fancy Interval class or anything. Just do this:

breakpoints = [int(x) for interval in intervals for x in interval[1:]]

And then proceed directly with gen_segments, above.

Community
  • 1
  • 1
roippi
  • 25,533
  • 4
  • 48
  • 73
0

Ill not spoil it for you, but give you a hint. Given

xs = [
    ['HE670029', '4095', '4096'],
    ['HE670029', '4098', '4099'],
    ['HE670029', '4102', '4102']]

The first and the last sections are easy to do. Just 0->the first node, and then the last one is to 5000. You need the interim values ...

First create functions that can extract the two end values of your rope:

def head(x): return int(x[1])
def last(x): return int(x[-1])

Now you will need to segrigate each subsequent line like so:

[a,b for a,b in zip(xs[:-1], xs[1:])]

Now since you have these values, you can go ahead and use the functions which you just created to exract the last and first values of each ...

[(last(a),head(b)) for (a,b) in zip(xs[:-1], xs[1:])]

These are not the values that you want is it? You need to shift a little here ...

[(last(a)+1,head(b)-1) for (a,b) in zip(xs[:-1], xs[1:])]

Finally, just put the right list in:

xM = [['HE670029', str(last(a)+1),str(head(b)-1)] for (a,b) in zip(xs[:-1], xs[1:])]    

Now you have 2 lists. xs and xM. I am sure you can loop through and combine them together ...Ypu can consideer playing with zip, list and concat if you want to improve the result.

ssm
  • 5,277
  • 1
  • 24
  • 42