26

I have these ranges:

7,10
11,13
11,15
14,20
23,39

I need to perform a union of the overlapping ranges to give ranges that are not overlapping, so in the example:

7,20
23,39

I've done this in Ruby where I have pushed the start and end of the range in array and sorted them and then perform union of the overlapping ranges. Any quick way of doing this in Python?

Georgy
  • 12,464
  • 7
  • 65
  • 73
bioinf80
  • 523
  • 3
  • 7
  • 14
  • 1
    I posted an uninteresting answer. Except that my solution produces right results even if there are for example 6,11 in the starting data, whereas eumiro's solution produces false result in this case. But such cases are probably unlikely to happen in your data. – eyquem Mar 07 '13 at 17:22
  • 1
    Are you assuming that only integers are valid inputs? 10.5 is not included in the input ranges, but it is included in the output ranges. And even with integers, are you assuming closed ranges rather than Python's standard half-open ranges? The union of x[7:10] and x[11:13] is x[7], x[8], x[9], x[11], x[12]. It does not include x[10]. – Dave Mar 07 '13 at 17:56
  • 1
    http://stackoverflow.com/questions/15787383/numpy-how-to-join-arrays-to-get-the-union-of-several-ranges – Olga Dec 24 '13 at 08:59

5 Answers5

27

Let's say, (7, 10) and (11, 13) result into (7, 13):

a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
    if b and b[-1][1] >= begin - 1:
        b[-1] = (b[-1][0], end)
    else:
        b.append((begin, end))

b is now

[(7, 20), (23, 39)]

EDIT:

As @CentAu correctly notices, [(2,4), (1,6)] would return (1,4) instead of (1,6). Here is the new version with correct handling of this case:

a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
    if b and b[-1][1] >= begin - 1:
        b[-1][1] = max(b[-1][1], end)
    else:
        b.append([begin, end])
tommy.carstensen
  • 8,962
  • 15
  • 65
  • 108
eumiro
  • 207,213
  • 34
  • 299
  • 261
  • 1
    (+1) I think this might look quite nice written as a generator (maybe). – NPE Mar 07 '13 at 14:30
  • 3
    For future reference, this does not work for ranges that include a smaller range. e.g. results of `[(2,4),(1,6)]` will be `[(1, 4)]` instead of `[(1, 6)]`. – CentAu Dec 14 '15 at 23:51
  • 1
    This line is wrong: `b[-1][1] = max(b[-1][1], end)`. It causes `TypeError: 'tuple' object does not support item assignment`. You need to use a mutable list `b.append([begin, end])` – tommy.carstensen Feb 13 '17 at 20:10
  • Try this `a = ([[7, 10], [7, 13], [11, 15], [6, 19], [6, 7], [14, 20], [23, 39], [40, 4], [200, 1] ])` and it will spit `[[6, 20], [23, 39], [200, 1]]` . Shouldn't it never bring `[200,1]` back? – JVK Feb 27 '18 at 19:59
16

Old question. But I wanted to add this answer for future references. sympy can be used to achieve union of intervals:

from sympy import Interval, Union
def union(data):
    """ Union of a list of intervals e.g. [(1,2),(3,4)] """
    intervals = [Interval(begin, end) for (begin, end) in data]
    u = Union(*intervals)
    return [list(u.args[:2])] if isinstance(u, Interval) \
       else list(u.args)

If output of Union is more than two intervals is a Union object while when there is a single interval, the output is an Interval object. That's the reason for the if statement in the return line.

examples:

In [26]: union([(10, 12), (14, 16), (15, 22)])
Out[26]: [[10, 12], [14, 22]]

In [27]: union([(10, 12), (9, 16)])
Out[27]: [[9, 16]]
CentAu
  • 10,660
  • 15
  • 59
  • 85
  • 2
    There are several things wrong with this answer. In the case that the Union is a list of Intervals, you return a Python list of sympy.Interval, which is not matching what you return if the Union is a single Interval. In addition, given that "sympy expression does not convert to " is one of the most common [sympy]-tagged questions, you should warn people that they often will need to explicitly convert back to Python native types using int(...), float(...) and complex(...). – Ron Kaminsky Sep 20 '20 at 06:41
1

I tried with particular cases of presence of (45, 46) and (45, 45)
and also test cases that are unlikely to happen in your application: presence of (11,6), presence of (-1, -5), presence of (-9, 5), presence of (-3, 10).
Anyway the results are right for all these cases, it's a point.

The algorithm:

def yi(li):
    gen = (x for a,b in li for x in xrange(a,b+1))
    start = p = gen.next()
    for x in gen:
        if x>p+2:
            yield (start,p)
            start = p = x
        else:
            p = x
    yield (start,x)

If aff in the following code is set to True, the steps of the execution are displayed.

def yi(li):
    aff = 0
    gen = (x for a,b in li for x in xrange(a,b+1))
    start = p = gen.next()
    for x in gen:
        if aff:
            print ('start %s     p %d  p+2 %d     '
                   'x==%s' % (start,p,p+2,x))
        if x>p+2:
            if aff:
                print 'yield range(%d,%d)' % (start,p+1)
            yield (start,p)
            start = p = x
        else:
            p = x
    if aff:
        print 'yield range(%d,%d)' % (start,x+1)
    yield (start,x)



for li in ([(7,10),(23,39),(11,13),(11,15),(14,20),(45,46)],
           [(7,10),(23,39),(11,13),(11,15),(14,20),(45,46),(45,45)],
           [(7,10),(23,39),(11,13),(11,15),(14,20),(45,45)],

           [(7,10),(23,39),(11,13),(11,6),(14,20),(45,46)], 
           #1 presence of (11, 6)
           [(7,10),(23,39),(11,13),(-1,-5),(14,20),(45,45)], 
           #2  presence of (-1,-5)
           [(7,10),(23,39),(11,13),(-9,-5),(14,20),(45,45)], 
           #3  presence of (-9, -5)
           [(7,10),(23,39),(11,13),(-3,10),(14,20),(45,45)]): 
           #4  presence of (-3, 10)

    li.sort()
    print 'sorted li    %s'%li
    print '\n'.join('  (%d,%d)   %r' % (a,b,range(a,b)) 
                     for a,b in li)
    print 'list(yi(li)) %s\n' % list(yi(li))

result

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20),
              (23, 39), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20), 
              (23, 39), (45, 45), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20), 
              (23, 39), (45, 45)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]

sorted li    [(7, 10), (11, 6), (11, 13), (14, 20), 
              (23, 39), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,6)   []
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(-1, -5), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-1,-5)   []
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]

sorted li    [(-9, -5), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-9,-5)   [-9, -8, -7, -6]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(-9, -5), (7, 20), (23, 39), (45, 45)]

sorted li    [(-3, 10), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-3,10)   [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(-3, 20), (23, 39), (45, 45)]
eyquem
  • 26,771
  • 7
  • 38
  • 46
0

The following function works well for the given example data:

def range_overlap_adjust(list_ranges):
    overlap_corrected = []
    for start, stop in sorted(list_ranges):
        if overlap_corrected and start-1 <= overlap_corrected[-1][1] and stop >= overlap_corrected[-1][1]:
            overlap_corrected[-1] = min(overlap_corrected[-1][0], start), stop
        elif overlap_corrected and start <= overlap_corrected[-1][1] and stop <= overlap_corrected[-1][1]:
            break
        else:
            overlap_corrected.append((start, stop))
    return overlap_corrected

Usage

list_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]   
print(range_overlap_adjust(list_ranges))
# prints [(7, 20), (23, 39)]
Georgy
  • 12,464
  • 7
  • 65
  • 73
0

Here is a one-liner using functools.reduce (assuming (x, 10) and (11, y) overlap):

reduce(
    lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
        if acc[-1][1] >= el[0] - 1
        else acc + [el],
    ranges[1::],
    ranges[0:1]
)

This begins with the first range and uses reduce to go through the rest of the ranges. It compares the last element (acc[-1]) with the next range (el). If they overlap, it replaces the last element with the min and max of the two ranges (acc[:-1:] + [min, max]). If they don't overlap, it simply puts this new range at the end of the list (acc + [el]).

Example:

from functools import reduce

example_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]

def combine_overlaps(ranges):
    return reduce(
        lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
            if acc[-1][1] >= el[0] - 1
            else acc + [el],
        ranges[1::],
        ranges[0:1],
    )

print(combine_overlaps(example_ranges))

Output:

[(7, 20), (23, 39)]
ParkerD
  • 1,214
  • 11
  • 18
  • This answer appears to assume the input list is in a particular order. For instance: `print(combine_overlaps([(7,10), (23,39), (9,20)]))`. This prints `[(7, 10), (9, 39)]` instead of `[(7,20), (23,39)]`. – Stef Feb 03 '22 at 14:04