11

i'm trying to write some code to calculate the difference of two sets of intervals A - B , Interval endpoints are integers , but i'm strugling to come with efficient solution , any suggestion would be much appreciated example : [(1, 4), (7, 9)] - [(3,5)] = [(1, 3), (7, 9)]

well this the best i tried so far ( the two lists are already sorted )

class tp():
   def __repr__(self):
       return '(%.2f,%.2f)' % (self.start, self.end)
   def __init__(self,start,end): 
       self.start=start
       self.end=end



z=[tp(3,5)] #intervals to be subtracted
s=[tp(1, 4)),tp(7, 9), tp(3,4),tp(4,6)]

for x in s[:]:
   if z.end < x.start:
    break
   elif z.start < x.start and z.end > x.start and z.end < x.end:
    x.start=z.end
   elif z.start < x.start and z.end > x.end:
    s.remove(x)
   elif z.start > x.start and z.end < x.end:
    s.append(tp(x.start,z.start))
    s.append(tp(z.end,x.end))
    s.remove(x)
   elif z.start > x.start and z.start < x.end and z.end > x.end:
    x.end=z.start
   elif z.start > x.end:
    continue
shx2
  • 61,779
  • 13
  • 130
  • 153
  • i tried doing the difference between each element in set 1 with each element with set 2 , it gives correct answer but i'm hopping for something more efficient – desprategstreamer Nov 18 '13 at 23:28
  • If you want something efficient, you should show us what you did and we will help you. – Gabriel L. Nov 18 '13 at 23:30
  • ok i eddited the question – desprategstreamer Nov 18 '13 at 23:46
  • Are you looking for help in Python? Also, the last block of code looks like it's not indented correctly. – godel9 Nov 19 '13 at 00:13
  • You say "the two lists are already sorted" but your example has `s=[tp(1, 4)),tp(7, 9), tp(3,4),tp(4,6)]` which is not sorted according to any obvious ordering operator. Also, your code seems to assume that `z` is a single `tp` but your example shows it to be a list. Could you clarify your expectations, please? – rici Nov 19 '13 at 16:37

3 Answers3

15

The only way to make the operation efficient is to keep the interval lists sorted and non-overlapping (which can be done in O(n log n)). [See Notes, below].

With both lists sorted and non-overlapping, any set operation (union, intersection, difference, symmetric difference) can be performed with a simple merge.

The merge operation is straightforward: simultaneously loop over the endpoints of both arguments, in order. (Note that the endpoints of each interval list are sorted because we require that the intervals not overlap.) For each endpoint discovered, decide whether it is in the result or not. If the result currently has an odd number of endpoints and the new endpoint is not in the result, add it to the result; similarly, if the result currently has an even number of endpoints and the new endpoint is in the result, add it to the result. At the end of this operation, the result is a list of endpoints, alternating between interval start and interval end.

Here it is in python:

# In all of the following, the list of intervals must be sorted and 
# non-overlapping. We also assume that the intervals are half-open, so
# that x is in tp(start, end) iff start <= x and x < end.

def flatten(list_of_tps):
    """Convert a list of intervals to a list of endpoints"""
    return reduce(lambda ls, ival: ls + [ival.start, ival.end],
                  list_of_tps, [])
    
def unflatten(list_of_endpoints):
    """Convert a list of endpoints, with an optional terminating sentinel,
       into a list of intervals"""
    return [tp(list_of_endpoints[i], list_of_endpoints[i + 1])
            for i in range(0, len(list_of_endpoints) - 1, 2)]
    
def merge(a_tps, b_tps, op):
    """Merge two lists of intervals according to the boolean function op"""
    a_endpoints = flatten(a_tps)
    b_endpoints = flatten(b_tps)
    
    sentinel = max(a_endpoints[-1], b_endpoints[-1]) + 1
    a_endpoints += [sentinel]
    b_endpoints += [sentinel]
    
    a_index = 0
    b_index = 0
    
    res = []
    
    scan = min(a_endpoints[0], b_endpoints[0])
    while scan < sentinel:
        in_a = not ((scan < a_endpoints[a_index]) ^ (a_index % 2))
        in_b = not ((scan < b_endpoints[b_index]) ^ (b_index % 2))
        in_res = op(in_a, in_b)
        
        if in_res ^ (len(res) % 2): res += [scan]
        if scan == a_endpoints[a_index]: a_index += 1
        if scan == b_endpoints[b_index]: b_index += 1
        scan = min(a_endpoints[a_index], b_endpoints[b_index])
    
    return unflatten(res)

def interval_diff(a, b):
    return merge(a, b, lambda in_a, in_b: in_a and not in_b)

def interval_union(a, b):
    return merge(a, b, lambda in_a, in_b: in_a or in_b)

def interval_intersect(a, b):
    return merge(a, b, lambda in_a, in_b: in_a and in_b)

Notes

  1. The intervals [a, b) and [b, c) are not overlapping since they are disjoint; b belongs only to the second one. The union of these two intervals will still be [a,c). But for the purposes of the functions in this answer, it's best to also require intervals to not be adjacent, by extending the definition of "non-overlapping" to include the case where the intervals are adjacent; otherwise, we risk finding the adjacency point needlessly included in the output. (That's not strictly speaking wrong, but it's easier to test functions if there output is deterministic.)

    Here's a sample implementation of a function which normalises an arbitrary list of intervals into a sorted, non-overlapping interval.

    def interval_normalise(a):
        rv = sorted(a, key = lambda x: x.start)
        out = 0
        for scan in range(1, len(rv)):
            if rv[scan].start > rv[out].end:
                if rv[out].end > rv[out].start: out += 1
                rv[out] = rv[scan]
            elif rv[scan].end > rv[out].end:
                rv[out] = tp(rv[out].start, rv[scan].end)
        if rv and rv[out].end > rv[out].start: out += 1
        return rv[:out]
    
rici
  • 234,347
  • 28
  • 237
  • 341
  • thanks , your answer really helped me , sorry again for the not so clear question i'm kinda a newbie here – desprategstreamer Nov 20 '13 at 09:57
  • I am fairly sure that XOR in the line: if in_res ^ ((len(res) % 2) should be an OR. – Dean Povey Jan 25 '17 at 14:35
  • @DeanPovey: It should be simple enough to test :) But afaics, the xor is correct. The result being built is either in a state where the last segment is open (just has a start) or is closed, and the new point is either in or out. You need to add the new endpoint if it's in-ness is different from the in-ness of the end of the result. And different from is encapsulated in an XOR. – rici Jan 25 '17 at 16:08
  • My mistake @rici. Nice answer. – Dean Povey Jan 27 '17 at 00:07
  • Nice! `functools.reduce` is an alternative to `reduce` in Python3 according to this thread: http://stackoverflow.com/questions/13638898/how-to-use-filter-map-and-reduce-in-python-3 – tommy.carstensen Feb 13 '17 at 18:19
  • 1
    @tommy.carstensen: It's not an alternative, really; it's the very same function, but banished to the `functools` module for some reason which I don't fully understand. Added a note in my code. – rici Feb 13 '17 at 20:07
  • @rici this is so beautiful. Can be optimized to not use flatten and unflatten, but I was definitely gasping for breath while understanding it. – Alex Gusev Sep 12 '17 at 07:17
  • @AlexGusev, creating a similar function that does not rely on `flatten` and `unflatten` is not an optimization, not even a marginal one. Sorting is much more computationally expensive than `flatten` and `unflatten`. Sorting is what defines the computational complexity of the algorithm. – Boris D. Teoharov Jan 23 '20 at 14:00
  • @rici, I've just noticed that with the current implementation of the `merge` in your answer `interval_union` does not return correct answers. Could you, please, update your answer so `interval_union` returns correct answers, too. – Boris D. Teoharov Apr 09 '20 at 11:28
  • @boris: Care to provide an example which demonstrates the incorrect answer? – rici Apr 09 '20 at 12:05
  • @rici, here is an example: https://repl.it/repls/YellowgreenMustyMemwatch The result contains more pairs than needed. – Boris D. Teoharov Apr 09 '20 at 12:51
  • 1
    @boris: the first line of my answer says that the lists need to be "sorted and **non-overlapping**". Your example does not satisfy those constraints. I added a normalisation function to my answer in case it is useful. – rici Apr 09 '20 at 13:03
  • @boris: in fact, rereading the answer I see that it mentions the non-overlapping requirement three times in the narrative plus once in the comment in the Python code. :-) – rici Apr 09 '20 at 13:17
  • I agree with the others that this is a beautiful solution. One minor simplification would be to replace `in_a = not ((scan < a_endpoints[a_index]) ^ (a_index % 2))` with `in_a = (scan < a_endpoints[a_index]) == (a_index % 2)`, and similarly for the `in_b` assignment. This is based on the equivalence of `not(x ^ y)` and `x == y`. – Peter Simon Sep 29 '21 at 16:20
2

This can be solved with a sweep line algorithm. The idea is to keep all the starting points of intervals from both sets in one sorted array and end points in other sorted array marking them with information that they belong to which set. e.g.

       A              B
[(1, 4), (7, 9)] - [(3,5)]
A: start:[1,7] end:[4,9], B: start:[3]end:[5]
start:[(1,a),(3,b),(7,a)]
end: [(4,a),(5,b),(9,a)]

Now have two pointer one to beginning of each array.In a loop increment one which points to lowest value adding intervals which start with a till they end with b or a. e.g. for above we will iterate points in this order

(1,a) (3,b) (4,a) (5,b) (7,a) (9,a)
# and adding intervals where we have seen an start a and an end a or b
(1,3) (7,9)

This leads to an linear solution in terms of number of intervals.

FUD
  • 5,114
  • 7
  • 39
  • 61
  • thanks for the anwser , but i thinnk this fails with the above : [4 7] - [2,5] the answer should be [5 7] but we won't consider it because it doesn't start with a. – desprategstreamer Nov 19 '13 at 08:07
0

Another implementation using numpy. I assume, as I think is more natural using integer endpoints, that the intervals are closed. For the approach I suggest below, we absolutely need to take care of the (semi-closed) intervals including -infinity and +infinity.

def merge_intervals(intervals):
    # Normalize to sorted non-overlaping intervals. Aimilar idea as in
    # https://www.geeksforgeeks.org/merging-intervals/
    if len(intervals)==0: return intervals 
    assert np.all(intervals[:,0]<=intervals[:,1]), f"merge_intervals: intervals not well defined. intervals={intervals}"
    if len(intervals)==1: return intervals    
    intervals = np.sort(intervals.copy(),axis=0)
    stack = []
    # insert first interval into stack
    stack.append(intervals[0])
    for i in intervals[1:]:
        # Check for overlapping interval,
        # if interval overlap
        if i[0] > stack[-1][1]+1:
            stack.append(i)
        else:
            stack[-1][1] = max(stack[-1][1], i[1])
    return np.array(stack)

def union_intervals(a,b):
    return merge_intervals(np.r_[a,b])

# The folowing is the key function. Needs to work  
# well with infinities and empty sets.
def complement_intervals(a): 
    if len(a)==0: return np.array([[-np.inf,np.inf]])
    a_normalized = merge_intervals(a)
    result0 = np.r_[-np.inf,a_normalized[:,1]+1]
    result1 = np.r_[a_normalized[:,0]-1,np.inf] 
    non_empty = np.logical_and(result0 < np.inf, result1 > -np.inf)
    result = np.c_[result0[non_empty],result1[non_empty]]
    if np.array_equal(result,np.array([[np.inf,-np.inf]])):
        result = np.array([])
    return merge_intervals(result)

def intersection_intervals(a,b):
    union_of_complements = union_intervals(complement_intervals(a),complement_intervals(b))
    return  complement_intervals(union_of_complements)

def difference_intervals(a,b):
    return intersection_intervals(a,complement_intervals(b))