1

I have a list of intervals in lists A,B I need to find which intervals overlap between A,B I am trying to find a better method than the brute-force one which has time complexity O(len(A)*len(B)) cause I am working with pretty large lists.

The best method I found is the following one, but it only works correctly if intervals in each list are pairwise disjoint(non-overlapping) which in my case doesn't happen, cause intervals in each list can overlap with each other.

def find_overlaps(A, B):
    overlaps = []
    i = j = 0
    
    # Sorts intervals in A,B based on start_position of each interval
    A_start_sorted = sorted(A, key=lambda x: x[0])
    B_start_sorted = sorted(B, key=lambda x: x[0])
    
    # Sorts intervals in A,B based on end_position of each interval
    A_end_sorted = sorted(A, key=lambda x: x[1])
    B_end_sorted = sorted(B, key=lambda x: x[1])
    
    A = A_start_sorted
    B = B_start_sorted
    
    
    while i < len(A) and j < len(B):
        start_A, end_A = A[i]
        start_B, end_B = B[j]
        
        # Let's check if A[i] intersects B[j].
        # lo - the startpoint of the intersection
        # hi - the endpoint of the intersection
        lo = max(A[i][0], B[j][0])
        hi = min(A[i][1], B[j][1])
        if lo <= hi:
            overlaps.append([[start_A, end_A],[start_B, end_B]])

        # Remove the interval with the smallest endpoint
        if end_A < end_B:
            i += 1
        else:
            j += 1
    
    return overlaps

A = [ [0,8], [2,3], [5,6], [7,9], [10,25], [30,40] ]
B = [ [0,6], [2,5], [3,4], [6,12], [8,16],[10,20] ]

overlaps = find_overlaps(A,B)

print(f"Overlaps are: {len(overlaps)}")
print(f"overlaps = {overlaps}")

For example the above code returns:

Overlaps are: 9
overlaps = [[[0, 8], [0, 6]], [[0, 8], [2, 5]], [[0, 8], [3, 4]], [[0, 8], [6, 12]], [[5, 6], [6, 12]], [[7, 9], [6, 12]], [[10, 25], [6, 12]], [[10, 25], [8, 16]], [[10, 25], [10, 20]]]

Whereas the right output should be:

Overlaps are: 16
overlaps = 
[[[[0,8],[0,6]], [[0,8],[2,5]], [[0,8],[3,4]], [[0,8],[6,12]], [[0,8],[8,16]], 
[[2,3],[0,6]], [[2,3],[2,5]], [[2,3],[3,4]],
[[5,6],[0,6]], [[5,6],[2,5]], [[5,6],[6,12]],
[[7,9],[6,12]], [[7,9],[8,16]],
[[10,25],[6,12]], [[10,25],[8,16]], [[10,25],[10,20]]]

I already tried changing the condition if end_A < end_B on when to move to the next A,B item but always seems to leave some overlaps out. I also tried adding a second while i < len(A) and j < len(B): using differently sorted A,B than the previous block so that it gets the overlaps that the first block misses but no success.

I think the idea is right but I can't figure out how to fix the logic. Any help would be appreciated!

1 Answers1

0

I'd remove the sorted() functions and use min()/max() to find overlapping intervals:

def intervals_overlap(i, j):
    if i[1] == j[0] or j[1] == i[0]:
        return True

    overlap = max(0, min(i[1], j[1]) - max(i[0], j[0]))
    return overlap > 0

def overlaping_intervals(A, B):
    out = []
    for i in A:
        for j in B:
            if intervals_overlap(i, j):
                out.append([i, j])
    return out

A = [[0, 8], [2, 3], [5, 6], [7, 9], [10, 25], [30, 40]]
B = [[0, 6], [2, 5], [3, 4], [6, 12], [8, 16], [10, 20]]


out = overlaping_intervals(A, B)
print(len(out))
print(out)

Prints:

16
[
    [[0, 8], [0, 6]],
    [[0, 8], [2, 5]],
    [[0, 8], [3, 4]],
    [[0, 8], [6, 12]],
    [[0, 8], [8, 16]],
    [[2, 3], [0, 6]],
    [[2, 3], [2, 5]],
    [[2, 3], [3, 4]],
    [[5, 6], [0, 6]],
    [[5, 6], [2, 5]],
    [[5, 6], [6, 12]],
    [[7, 9], [6, 12]],
    [[7, 9], [8, 16]],
    [[10, 25], [6, 12]],
    [[10, 25], [8, 16]],
    [[10, 25], [10, 20]],
]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91