How to optimize splitting overlapping ranges?

Question

This Python script I wrote to split overlapping ranges into unique ranges (last iteration). It produces correct output and outperforms the version given in the answer. I tested output against known correct method's output and output of another brute force approach. I confirmed correctness of all methods and my code is the most efficient.

An infinite number of boxes arranged in a line are numbered. Each can only hold one object, whatever was last put into the box. They are initially empty. Now imagine a list of triplets, first two elements of each triplet are integers (the first no greater than the second) and each triplet represents an instruction to put the third element into the boxes. Triplet (0, 10, 'A') means "put 'A' into boxes 0 to 10 (inclusive)" and after execution of the instruction boxes 0 to 10 each contain an instance of 'A'. Empty boxes are to be ignored. The task is to describe the state of boxes after executing all instructions, using the least number of triplets. The rules are simple, there are narrow cases and general cases:

Narrow case: given triplets (s1, e1, d1) and (s2, e2, d2), s1 < s2 < e1 < e2 is always False (all pairings in the input conform to this). There are four sub cases:

Case 1: s1 = s2 and e2 < e1:

Whichever ends first wins. Boxes always hold the value from the winner, given [(0, 10, 'A'), (0, 5, 'B')], after execution the state of the boxes is [(0, 5, 'B'), (6, 10, 'a')].
Case 2: s1 < s2 and e2 < e1:

Whichever ends first wins, example: [(0, 10, 'A'), (5, 7, 'B')] -> [(0, 4, 'A'), (5, 7, 'B'), (8, 10, 'A')].
Case 3: s1 < s2 and e2 = e1:

Same rule as above, but here it is a tie. In case of ties, whichever starts later wins: [(0, 10, 'A'), (6, 10, 'B')] -> [(0, 5, 'A'), (6, 10, 'B')].
Case 4: s1 = s2 and e1 = e2:

This is special. In case of true ties, whichever comes later in the input wins. My code doesn't guarantee the objects in the boxes are from the latest instructions in cases like this (but it guarantees all boxes have only one object).

Additional rules:

If there are no updates, do nothing.

[(0, 10, 'A'), (5, 6, 'A')] -> [(0, 10, 'A')]
If there are gaps, leave them as is (those boxes are empty):

[(0, 10, 'A'), (15, 20, 'B')] -> [(0, 10, 'A'), (15, 20, 'B')]
If e1 + 1 = s2 and d1 = d2, join them (least amount of triplets).

[(0, 10, 'A'), (11, 20, 'A')] -> [(0, 20, 'A')]
The general case is when the primary condition of the narrow case doesn't hold, in case of true intersections. Whichever starts later wins.

[(0, 10, 'A'), (5, 20, 'B')] -> [(0, 4, 'A'), (5, 20, 'B')]

The brute force implementation is correct but slow and can handle the general cases:

def brute_force_discretize(ranges):
    numbers = {}
    ranges.sort(key=lambda x: (x[0], -x[1]))
    for start, end, data in ranges:
        numbers |= {n: data for n in range(start, end + 1)}
    numbers = list(numbers.items())
    l = len(numbers)
    i = 0
    output = []
    while i < l:
        di = 0
        curn, curv = numbers[i]
        while i < l and curn + di == numbers[i][0] and curv == numbers[i][1]:
            i += 1
            di += 1
        output.append((curn, numbers[i-1][0], curv))
    return output

Smart implementation that is performant but only handles the narrow case:

from typing import Any, List, Tuple

def get_nodes(ranges: List[Tuple[int, int, Any]]) -> List[Tuple[int, int, Any]]:
    nodes = []
    for ini, fin, data in ranges:
        nodes.extend([(ini, False, data), (fin, True, data)])
    return sorted(nodes)


def merge_ranges(data: List[List[int | Any]], range: List[int | Any]) -> None:
    if not data or range[2] != (last := data[-1])[2] or range[0] > last[1] + 1:
        data.append(range)
    else:
        last[1] = range[1]


def discretize_narrow(ranges):
    nodes = get_nodes(ranges)
    output = []
    stack = []
    actions = []
    for node, end, data in nodes:
        if not end:
            action = False
            if not stack or data != stack[-1]:
                if stack and start < node:
                    merge_ranges(output, [start, node - 1, stack[-1]])
                stack.append(data)
                start = node
                action = True
            actions.append(action)
        elif actions.pop(-1):
            if start <= node:
                merge_ranges(output, [start, node, stack.pop(-1)])
                start = node + 1
            else:
                stack.pop(-1)
    return output

Full script that generates narrow cases. Performance:

In [518]: sample = make_sample(2048, 65536, 16)

In [519]: %timeit descretize(sample)
4.46 ms ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [520]: %timeit discretize_narrow(sample)
3.13 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [521]: list(map(tuple, discretize_narrow(sample))) == descretize(sample)
Out[521]: True

How can I make my code faster? Inputs are sorted in ascending order, but my code assumes it isn't. If I split the top level data into non-overlapping ranges (taking advantage of input being sorted), accumulate triplets to a stack while there are overlaps and push the stack to discretize and clear the stack when overlaps are over, then join intermediate results, the code can be faster. I can't get this working.

And how can I handle the general case? I think I need to know when ranges end (four states) but can't handle it correctly:

def get_quadruples(ranges):
    nodes = []
    for ini, fin, data in ranges:
        nodes.extend([(ini, False, -fin, data), (fin, True, ini, data)])
    return sorted(nodes)

I can't post this on Code Review because this is a "how" question (I haven't achieved my goals). To demonstrate Interval Tree's slowness, this answer on Code Review uses it:

In [531]: %timeit discretize_narrow([(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')])
14.3 µs ± 42.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [532]: %timeit merge_rows([(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')])
891 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [533]: data = [(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')]

In [534]: merge_rows(data) == discretize_narrow(data)
Out[534]: True

In [535]: sample = make_sample(256, 65536, 16)

In [536]: merge_rows(sample) == discretize_narrow(sample)
Out[536]: True

In [537]: %timeit discretize_narrow(sample)
401 µs ± 3.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [538]: %time result = merge_rows(sample)
CPU times: total: 78.1 ms
Wall time: 56 ms

It is not efficient but I don't think I can implement a more efficient version. Test cases can be generated using code in the linked GitHub page. My logic is the same as dictionary update and my brute force implementation is just that (using dictionary update to process the ranges, it is by definition correct). Correctness of my smart approach was verified using output of the brute force one.

Manual test cases:

In [539]: discretize_narrow([(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')])
Out[539]:
[[0, 1, 'B'],
 [2, 7, 'C'],
 [8, 8, 'D'],
 [9, 10, 'A'],
 [110, 150, 'E'],
 [250, 255, 'C'],
 [256, 270, 'D'],
 [271, 294, 'C'],
 [295, 300, 'E'],
 [500, 600, 'F']]

In [540]: discretize_narrow([(0, 100, 'A'), (10, 25, 'B'), (15, 25, 'C'), (20, 25, 'D'), (30, 50, 'E'), (40, 50, 'F'), (60, 80, 'G'), (150, 180, 'H')])
Out[540]:
[[0, 9, 'A'],
 [10, 14, 'B'],
 [15, 19, 'C'],
 [20, 25, 'D'],
 [26, 29, 'A'],
 [30, 39, 'E'],
 [40, 50, 'F'],
 [51, 59, 'A'],
 [60, 80, 'G'],
 [81, 100, 'A'],
 [150, 180, 'H']]

Machine generated general case (and correct output):

In [542]: ranges = []

In [543]: for _ in range(20):
     ...:     start = random.randrange(100)
     ...:     end = random.randrange(100)
     ...:     if start > end:
     ...:         start, end = end, start
     ...:     ranges.append([start, end, random.randrange(5)])

In [544]: ranges.sort()

In [545]: ranges
Out[545]:
[[0, 31, 0],
 [1, 47, 1],
 [1, 67, 0],
 [10, 68, 0],
 [15, 17, 2],
 [18, 39, 0],
 [19, 73, 3],
 [25, 32, 0],
 [26, 33, 1],
 [26, 72, 2],
 [26, 80, 2],
 [28, 28, 1],
 [29, 31, 4],
 [30, 78, 2],
 [36, 47, 0],
 [36, 59, 4],
 [44, 67, 3],
 [52, 61, 4],
 [58, 88, 1],
 [64, 92, 1]]

In [546]: brute_force_discretize(ranges)
Out[546]:
[(0, 0, 0),
 (1, 9, 1),
 (10, 14, 0),
 (15, 17, 2),
 (18, 18, 0),
 (19, 24, 3),
 (25, 25, 0),
 (26, 28, 1),
 (29, 29, 4),
 (30, 35, 2),
 (36, 43, 0),
 (44, 51, 3),
 (52, 57, 4),
 (58, 92, 1)]

Edit

Function that makes generic cases:

def make_generic_case(num, lim, dat):
    ranges = []

    for _ in range(num):
        start = random.randrange(lim)
        end = random.randrange(lim)
        if start > end:
            start, end = end, start
        ranges.append([start, end, random.randrange(dat)])
    
    ranges.sort()
    return ranges

Sample inputs that code from existing answer disagrees with the correct result:

Test case:

[
    [0, 31, 0],
    [1, 47, 1],
    [1, 67, 0],
    [10, 68, 0],
    [15, 17, 2],
    [18, 39, 0],
    [19, 73, 3],
    [25, 32, 0],
    [26, 33, 1],
    [26, 72, 2],
    [26, 80, 2],
    [28, 28, 1],
    [29, 31, 4],
    [30, 78, 2],
    [36, 47, 0],
    [36, 59, 4],
    [44, 67, 3],
    [52, 61, 4],
    [58, 88, 1],
    [64, 92, 1]
]

Output:

[
    (0, 0, 0),
    (1, 9, 1),
    (10, 14, 0),
    (15, 17, 2),
    (18, 18, 0),
    (19, 24, 3),
    (25, 25, 0),
    (26, 28, 1),
    (29, 29, 4),
    (30, 35, 2),
    (36, 43, 0),
    (44, 51, 3),
    (52, 57, 4),
    (58, 88, 1)
]

The output is almost the same as the correct one, except for the last range, it should be (58, 92, 1) instead of (58, 88, 1).

And another case:

[
    [4, 104, 4],
    [22, 463, 2],
    [24, 947, 2],
    [36, 710, 1],
    [37, 183, 1],
    [39, 698, 7],
    [51, 438, 4],
    [60, 450, 7],
    [120, 383, 2],
    [130, 193, 7],
    [160, 562, 5],
    [179, 443, 6],
    [186, 559, 6],
    [217, 765, 2],
    [221, 635, 2],
    [240, 515, 3],
    [263, 843, 3],
    [274, 759, 6],
    [288, 389, 5],
    [296, 298, 6],
    [333, 1007, 1],
    [345, 386, 5],
    [356, 885, 3],
    [377, 435, 5],
    [407, 942, 7],
    [423, 436, 1],
    [484, 926, 5],
    [496, 829, 0],
    [559, 870, 5],
    [610, 628, 1],
    [651, 787, 4],
    [735, 927, 1],
    [765, 1002, 1]
]

Output:

[(4, 21, 4),
 (22, 35, 2),
 (36, 38, 1),
 (39, 50, 7),
 (51, 59, 4),
 (60, 119, 7),
 (120, 129, 2),
 (130, 159, 7),
 (160, 178, 5),
 (179, 216, 6),
 (217, 239, 2),
 (240, 273, 3),
 (274, 287, 6),
 (288, 295, 5),
 (296, 298, 6),
 (299, 332, 5),
 (333, 344, 1),
 (345, 355, 5),
 (356, 376, 3),
 (377, 406, 5),
 (407, 422, 7),
 (423, 436, 1),
 (437, 483, 7),
 (484, 495, 5),
 (496, 558, 0),
 (559, 609, 5),
 (610, 628, 1),
 (629, 650, 5),
 (651, 734, 4),
 (735, 927, 1),
 (928, 942, 7),
 (943, 1007, 1)]

Again, it is almost correct, but the last three ranges are wrong.

They are (735, 927, 1), (928, 942, 7), (943, 1007, 1) here but they should be (735, 1007, 1) instead.

I have tested a few other inputs and the output of the proposed solution is correct, but I have identified some edge cases where it isn't.

I have just implemented the algorithm found here, but it isn't working correctly and fails the narrow case:

def discretize_gen(ranges):
    nodes = get_nodes(ranges)
    stack = []
    for (n1, e1, d1), (n2, e2, _) in zip(nodes, nodes[1:]):
        if e1:
            stack.remove(d1)
        else:
            stack.append(d1)
        start = n1 + e1
        end = n2 - (not e2)
        if start <= end and stack:
            yield start, end, stack[-1]

Use like this: list(merge(discretize_gen(ranges))), merge function can be found in the answer below.

It fails for many inputs, I have just wrote an ugly function to compare the output against the correct output:

def compare_results(ranges):
    correct = brute_force_discretize(ranges)
    correct_set = set(correct)
    output = list(merge(discretize_gen(ranges)))
    output_set = set(output)
    errors = output_set ^ correct_set
    indices = [(i, correct.index(e)) for i, e in enumerate(output) if e not in errors]
    indices.append((len(output), None))
    comparison = [(a, b) if c - a == 1 else (slice(a, c), slice(b, d)) for (a, b), (c, d) in zip(indices, indices[1:])]
    result = {}
    for a, b in comparison:
        key = output[a]
        val = correct[b]
        if isinstance(key, list):
            key = tuple(key)
        result[key] = val
    return result

A test case where it fails:

[(73, 104, 3), (75, 98, 0), (78, 79, 3), (83, 85, 3), (88, 90, 2)]

Comparison:

{(73, 74, 3): (73, 74, 3),
 ((75, 77, 0), (78, 87, 3)): [(75, 77, 0),
  (78, 79, 3),
  (80, 82, 0),
  (83, 85, 3),
  (86, 87, 0)],
 ((88, 90, 2), (91, 104, 3)): [(88, 90, 2), (91, 98, 0), (99, 104, 3)]}

I don't know why it fails, the answer has a score of 11 and is accepted, yet it doesn't work somehow. How to fix it?

Edit

There was an indentation error in the function make_generic_case that causes the appending action to be unintentionally part of a if condition block, so that a new range will only be appended if start is greater than end, the start and end variables will be swapped and the range will be appended.

My intention was to append a new range for all generated pairs, I fixed the problem by unindenting the line. I only noticed the error now, the error was caused by copypasting, I pasted the code into Visual Studio Code and the auto indentation fixing broke the indentation.

Have you looked into interval trees? Seems to be an efficient data structure to find your triplets, and it should work in the general case where ranges are overlapping or not. They are made explicitly to make intersection of ranges efficient. — AnilCh, Jul 15 '23 at 11:50
@AnilCh I have looked at that, and the performance of the Interval Tree implementation from pypi is extremely slow, I have tested it thoroughly. You can see my testings with it in the linked posts. The point is it is too inefficient and it doesn't suit my needs. — Ξένη Γήινος, Jul 15 '23 at 12:04
I'd suggest, for better answers, that you: include mostly pseudo-code with specific efficiency (what's your current efficiency? what is your brute force efficiency? what would be the summary of that process? What are the algorithms other people have already suggested, what's their efficiency, why they do not work, huge etc.). It also seems a bit confusing to me that interval trees would be slow when they are logarithmic to find intersections, if you know the answer to that, maybe include that reasoning in your question. — AnilCh, Jul 15 '23 at 12:17
*"each triplet represents an instruction [...] describe the state of boxes after executing all instructions, using the least number of triplets."*: I don't understand. First it seems triplets are instructions, but then we should execute **all** instructions with the **least** number of triplets? What does that mean when the terms are virtual synonyms? — trincot, Jul 15 '23 at 17:12
I thought my brute force implementation makes it clear, it is dictionary update and compression. `(0, 5, 'A')` means `{0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A'}`, they represent the same information, `(0, 5)` is compression, it means all integers from 0 to 5. With another triplet `(0, 2, 'B')`, the second is the same as `{0: 'B', 1: 'B', 2: 'B'}`, together they represent `{0: 'B', 1: 'B', 2: 'B', 3: 'A', 4: 'A', 5: 'A'}`, which is then compressed into triplet representation `[(0, 2, 'B'), (3, 5, 'A')]`. I have explained my rules thoroughly in my question above. — Ξένη Γήινος, Jul 15 '23 at 17:32
The triplets can be seen as instructions, they can be ambiguous, but my rules are specified so that any selection of triplets can unambiguously describe a state, and the task is to find the set of instructions that describes the same state in the most concise way. — Ξένη Γήινος, Jul 15 '23 at 17:44
This seems to be a bit different than the question posted on code review, which had constraints on the input triplets where if two triplets overlapped one was a subset of the other. You seem to have relaxed that condition. Fine. But if an input triplet is a compressed instruction specifying an update of a range of keys with the same value on an initially empty dictionary, then the order of input greatly matters. So it's not clear why you are first sorting your input. If the triplet order for doing updates is well-defined, why would the input triplets not be in that well-defined order? — Booboo, Jul 15 '23 at 19:45
Sorry, I don't see any explanation why "all instructions should be executed with least number of triplets". That reads like "all instructions should be executed with least number of instructions" which... doesn't make sense. Somehow you first identify an instruction as a tuple (synonymous), but then seem to make a difference. Confusing. — trincot, Jul 15 '23 at 20:28
Please provide an input for which `discretize_narrow` does not return the correct result. If there is no such input, then what is the problem? — trincot, Jul 16 '23 at 13:49
I have already provided a test case in which `discretize_narrow` doesn't generate the correct output in the question body, it is the generic case in which ranges intersection, it is the last test case generated by the computer. I thought that was obvious. — Ξένη Γήινος, Jul 16 '23 at 14:20
The "brute force" method is actually not in general inefficient but becomes more and more so as the difference between the maximum and minimum key values increase. I have a rather unusual solution, some might say "dumb", that in the general case performs poorly but out-performs the brute force approach when the input size, i.e. the number of triplets. is *small*. It also handles arbitrary ranges in any order and there is no requirement that if a range overlaps another range that one must be fully contained within the other. — Booboo, Jul 19 '23 at 21:57

trincot · Accepted Answer · 2023-07-18T11:32:00.690

Here is an approach that is somewhat like the one you have in discretize_narrow, and seems to give a slight performance improvement (10-20%) when I timed it with make_sample(2048, 65536, 16) inputs:

def solve(ranges):
    if not ranges:
        return []
        
    def disjoint_segments(ranges):
        ranges.sort(key=lambda x: (x[0], -x[1]))
        # In order to get extra loop iteration, add one dummy range
        ranges.append((float("inf"), None, None))  
        stack_end = []  # use separate stacks for ends, and data
        stack_data = []
        current = ranges[0][0]
        for start, end, data in ranges:
            # flush data from stack up to start - 1.
            while stack_end and stack_end[-1] < start:
                end2 = stack_end.pop()
                data2 = stack_data.pop()
                if current <= end2:
                    yield current, end2, data2
                    current = end2 + 1
            if stack_end and current < start:
                yield current, start - 1, stack_data[-1]
            # stack the current tuple's info
            current = start
            if not stack_end or stack_data[-1] != data or end > stack_end[-1]:
                stack_end.append(end)
                stack_data.append(data)

    def merge(segments):
        start, end, data = next(segments)  # keep one segment buffered
        for start2, end2, data2 in segments:
            if end + 1 == start2 and data == data2:  # adjacent & same data
                end = end2  # merge
            else:
                yield start, end, data
                start, end, data = start2, end2, data2
        yield start, end, data  # flush the buffer
    
    return list(merge(disjoint_segments(ranges)))

The above function should also produce correct results for the "generic" case.

How to optimize splitting overlapping ranges?

Edit

Edit

1 Answers1

Linked