I have a list of intervals in lists A,B I need to find which intervals overlap between A,B I am trying to find a better method than the brute-force one which has time complexity O(len(A)*len(B)) cause I am working with pretty large lists.
The best method I found is the following one, but it only works correctly if intervals in each list are pairwise disjoint(non-overlapping) which in my case doesn't happen, cause intervals in each list can overlap with each other.
def find_overlaps(A, B):
overlaps = []
i = j = 0
# Sorts intervals in A,B based on start_position of each interval
A_start_sorted = sorted(A, key=lambda x: x[0])
B_start_sorted = sorted(B, key=lambda x: x[0])
# Sorts intervals in A,B based on end_position of each interval
A_end_sorted = sorted(A, key=lambda x: x[1])
B_end_sorted = sorted(B, key=lambda x: x[1])
A = A_start_sorted
B = B_start_sorted
while i < len(A) and j < len(B):
start_A, end_A = A[i]
start_B, end_B = B[j]
# Let's check if A[i] intersects B[j].
# lo - the startpoint of the intersection
# hi - the endpoint of the intersection
lo = max(A[i][0], B[j][0])
hi = min(A[i][1], B[j][1])
if lo <= hi:
overlaps.append([[start_A, end_A],[start_B, end_B]])
# Remove the interval with the smallest endpoint
if end_A < end_B:
i += 1
else:
j += 1
return overlaps
A = [ [0,8], [2,3], [5,6], [7,9], [10,25], [30,40] ]
B = [ [0,6], [2,5], [3,4], [6,12], [8,16],[10,20] ]
overlaps = find_overlaps(A,B)
print(f"Overlaps are: {len(overlaps)}")
print(f"overlaps = {overlaps}")
For example the above code returns:
Overlaps are: 9
overlaps = [[[0, 8], [0, 6]], [[0, 8], [2, 5]], [[0, 8], [3, 4]], [[0, 8], [6, 12]], [[5, 6], [6, 12]], [[7, 9], [6, 12]], [[10, 25], [6, 12]], [[10, 25], [8, 16]], [[10, 25], [10, 20]]]
Whereas the right output should be:
Overlaps are: 16
overlaps =
[[[[0,8],[0,6]], [[0,8],[2,5]], [[0,8],[3,4]], [[0,8],[6,12]], [[0,8],[8,16]],
[[2,3],[0,6]], [[2,3],[2,5]], [[2,3],[3,4]],
[[5,6],[0,6]], [[5,6],[2,5]], [[5,6],[6,12]],
[[7,9],[6,12]], [[7,9],[8,16]],
[[10,25],[6,12]], [[10,25],[8,16]], [[10,25],[10,20]]]
I already tried changing the condition if end_A < end_B
on when to move to the next A,B item but always seems to leave some overlaps out.
I also tried adding a second while i < len(A) and j < len(B):
using differently sorted A,B than the previous block so that it gets the overlaps that the first block misses but no success.
I think the idea is right but I can't figure out how to fix the logic. Any help would be appreciated!