How should I take a list containing [start, end] ranges and "absorb" shorter ranges into overlapping bigger ones?

Question

I currently have a list that looks something like:

>>> list_in_question = [[0, 4], [0, 3], [0, 4], [3, 4], [6, 7]]

Each list element represents a span/range with the first element being the start and the second being the end. I want to take the overlapping elements and combine them so the largest range is remaining. There are no cases where the lists are "partially overlapping" (e.g., [3, 5], [4, 6]).

The list that I want is something like:

>>> list_in_question = [[0, 4], [0, 3], [0, 4], [3, 4], [6, 7]]
>>> list_i_want = [[0, 4], [6, 7]]

Some ways that I've thought of doing this is to make two separate lists to keep track of whether an element is valid or not, but the code is hard to read and (I assume) prone to errors. It looks something like this:

valid_lists = []
invalid_lists = []

# Outer loop to go through element lists.
for i, _ in enumerate(list_in_question):
    if list_in_question[i] in valid_lists:
        continue

    # Inner loop to go through subsequent element lists after the i'th one.
    for j in range(i + 1, len(list_in_question)):
        first_range = list_in_question[i]
        second_range = list_in_question[j]

        first_range_expanded = set(range(first_range[0], first_range[1])
        second_range_expanded = set(range(second_range[0], second_range[1])

        if first_range_expanded.issubset(second_range_expanded):
            if second_range not in valid_lists:
                valid_lists.append(second_range)

            invalid_lists.append(first_range)
        elif second_range_expanded.issubset(first_range_expanded):
            if first_range not in valid_lists:
                valid_lists.append(first_range)

            invalid_lists.append(second_range)


for range_list in list_in_question:
    if range_list not in valid_lists and range_list not in invalid_lists:
        valid_lists.append(range_list)

It works for some specific cases, but I'd like to know if there's a better way to go about this. Thanks.

this can help you [merge overlap periods](https://www.geeksforgeeks.org/merging-intervals/) — sahasrara62, Feb 19 '21 at 12:25

score 2 · Accepted Answer · answered Feb 19 '21 at 10:49

l = [[0, 4], [0, 3], [0, 4], [3, 4], [6, 7]]

First, we can make the problem a lot easier (by only merging into previous lists) by sorting by range width (we can always recover the order later if needed):

l.sort(key=lambda r: r[1] - r[0], reverse=True)

This works because if b gets absorbed into a we must have that the width of range b is smaller or equal to a (and when equal either could get absorbed with no change in result).

Then (more efficient algorithms exist but are overkill unless you have huge instances), filtering is relatively simple:

absorbers = []
for r in l:
    absorbed = any(a[0] <= r[0] <= r[1] <= a[1]
                   for a in absorbers)
    if not absorbed:
        absorbers.append(r)

Nerveless_child · Answer 2 · 2021-02-19T12:26:27.060

This should do it:

list_in_question.sort(key=lambda x: x[0])
merged_span = [list_in_question[0]]

for span in list_in_question:
    if merged_span[-1][0] <= span[0] <= merged_span[-1][1]:
        merged_span[-1][1] = max(span[1], merged_span[-1][1])

    elif merged_span[-1][0] <= span[1] <= merged_span[-1][1]:
        merged_span[-1][0] = min(span[0], merged_span[-1][0])
        
    else:
        merged_span.append(span)
        
print(merged_span)

You might need to +1 or -1 for span[0] or span[1] when comparing, depending on what behavior you want.

Alain T. · Answer 3 · 2021-02-19T20:54:44.680

If the lengths of the ranges aren't too large, you could use a set to detect overlaps. Processing the ranges in decreasing order of lengths will ensure that the wider ranges are in the set first and so, any shorter ones that overlap will be excluded.

rangeList = [[0, 4], [0, 3], [0, 4], [3, 4], [6, 7]]

merged    = [ [a,b] for cover in [set()]
                    for a,b in sorted(rangeList,key=lambda r:r[0]-r[1])
                    if a not in cover and not cover.update(range(a,b+1)) ]

print(merged)

# [[0, 4], [6, 7]]

You can also do it without sorting by expanding merged ranges as you go through the list:

merged = list()
for a,b in rangeList:
    i,c,d = next( ((i,c,d) for i,(c,d) in enumerate(merged) if c<=b and a<=d), 
                  (len(merged),a,b))
    merged[i:i+1] = [[min(a,c),max(b,d)]]

How should I take a list containing [start, end] ranges and "absorb" shorter ranges into overlapping bigger ones?

3 Answers3