0

I am looking into this problem:

Let's assume we have N segments on the X-axis with different start and end points. This is described with the next data structure:

[(start1, end1),..., (startN, endN)]

Now we want to calculate the whole size of such segments, but overlaps should be not be double counted.

Example

input: [(0,3),(1,2), (6,7)]
output: 4

Because segment (1,2) overlaps with segment (0,3), the distance between 0 and 3 is just 3, and then between 6 and 7 it is 1. So 3+1 = 4.

Question

Is there an algorithm to make this calculation for large input sizes?

  • This question is better suited for the computer sceince site: https://cs.stackexchange.com/help/on-topic – FlyingTeller Feb 03 '22 at 12:05
  • Do you know a-priori the max x coordinate? How large is large (1000s, millions,...)? What's the definition of nested. Is `(3,5)` nested in the two lines `(2,4), (4,6)` – FlyingTeller Feb 03 '22 at 12:06
  • 1
    This question should not have been closed (at least not for lacking debugging details). Not all on-topic questions are about a bug. – trincot Feb 03 '22 at 12:42
  • Related question: [Union of intervals](https://stackoverflow.com/questions/1034802/union-of-intervals) – Stef Feb 03 '22 at 13:54
  • A solution using sympy: `from sympy import Interval, Union`; `def combined_length(lst): return Union(*starmap(Interval, lst)).measure`; `print(combined_length([(0, 3), (1, 2), (6, 7)]))` – Stef Feb 03 '22 at 13:57
  • Also related: [Union of multiple ranges](https://stackoverflow.com/a/34279227/3080723) – Stef Feb 03 '22 at 14:05

1 Answers1

2

You can proceed as follows:

Extract both coordinates of each pair and mark whether they end a segment (flag) or not (when they start one). Sort these new pairs (x, end). Then process them. When the coordinate starts a segment, push the coordinate on a stack. When it ends a segment, pop the corresponding start coordinate from the stack. If the stack is empty add the difference between end and start coordinate to the total:

lst = [(0,3),(1,2), (6,7)]

stack = []
total = 0
for x, end in sorted((x, i) for pair in lst for i, x in enumerate(pair)):
    if end:
        start = stack.pop()
        if not len(stack):
            total += x - start
    else:
        stack.append(x)

print(total)  # 4
trincot
  • 317,000
  • 35
  • 244
  • 286