49

I have a set of ranges that might look something like this:

[(0, 100), (150, 220), (500, 1000)]

I would then add a range, say (250, 400) and the list would look like this:

[(0, 100), (150, 220), (250, 400), (500, 1000)]

I would then try to add the range (399, 450), and it would error out because that overlapped (250, 400).

When I add a new range, I need to search to make sure the new range does not overlap an existing range. And no range will ever be in the list that overlaps another range in the list.

To this end, I would like a data structure that cheaply maintained its elements in sorted order, and quickly allowed me to find the element before or after a given element.

  • Is there a better way to solve this problem? Is there a data structure like that available in Python?
  • I know the bisect module exists, and that's likely what I will use. But I was hoping there was something better.
  • EDIT: I solved this using the bisect module. I had a link to the code since it was a bit longish. Unfortunately, paste.list.org turned out to be a bad place to put it because it's not there anymore.
Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • Consider a standard tree with a modified node-descend condition (have min/max vs. singular). Perhaps a simple [Red-Black](http://en.wikipedia.org/wiki/Red-black_tree), AVL or similar. Python examples/implementations should abound. I think this is too specialized to be a "standard" structure. –  Apr 03 '11 at 05:09
  • 4
    Doesn't the range (250,400) overlap with (150,300) in your example? – ire_and_curses Apr 03 '11 at 05:24
  • @ire_and_curses: *sheepish grin* Yes, yes it does. *sigh* I'm fixing it now. – Omnifarious Apr 03 '11 at 09:01
  • 1
    @pst: It seems silly to have to implement a Red-Black tree in Python. – Omnifarious Apr 06 '11 at 22:33
  • 1
    Not directly answering your question but it seems like what you really want is an intervaltree. https://pypi.python.org/pypi/intervaltree – yarian Jan 27 '17 at 19:02
  • Possible duplicate of [Does python have a sorted list?](https://stackoverflow.com/questions/1109804/does-python-have-a-sorted-list) – Ciro Santilli OurBigBook.com May 23 '17 at 08:46
  • Whether you want a list or tree depends on whether *"quickly find element before or after given element"* means O(1) or O(log n). And *"cheaply maintain its elements in sorted order"* depends on whether you mean only space-efficiency (-> can't beat sorted list) or a combination of space- and CPU-efficiency. And whether your main use case is lookup-by-value, deletion, replacement, merge/insert, pivoting, rebalancing. – smci Mar 19 '22 at 23:35
  • @smci - My use case was implementing a TCP-like network protocol. The ranges were keeping track of the logical stream of bytes had been received. Honestly, thinking about how it would likely work in practice, a short list maintained with insertion and searched linearly would probably be just fine. :-) – Omnifarious Mar 29 '22 at 13:38
  • Ominfarious: ok, but users in general will come based on the question as stated, and also with their own use case in mind. So it's important to define *"quick lookup for preceding/following element"* as O(1) or O(log n). And whether *"cheap in-order insert"* means only space-efficiency (-> can't beat sorted list) or a combination of space- and CPU-efficiency. For most users it comes down to tree vs sorted-list. – smci Mar 31 '22 at 00:10

5 Answers5

20

It looks like you want something like bisect's insort_right/insort_left. The bisect module works with lists and tuples.

import bisect

l = [(0, 100), (150, 300), (500, 1000)]
bisect.insort_right(l, (250, 400))
print l # [(0, 100), (150, 300), (250, 400), (500, 1000)]
bisect.insort_right(l, (399, 450))
print l # [(0, 100), (150, 300), (250, 400), (399, 450), (500, 1000)]

You can write your own overlaps function, which you can use to check before using insort.

I assume you made a mistake with your numbers as (250, 400) overlaps (150, 300). overlaps() can be written like so:

def overlaps(inlist, inrange):
    for min, max in inlist:
        if min < inrange[0] < max and max < inrange[1]:
            return True
    return False
jp48
  • 1,186
  • 10
  • 17
Nick Presta
  • 28,134
  • 6
  • 57
  • 76
  • 5
    Yes, this would work. Unfortunately, inserting in the middle of the list is _O(n)_. But, it would work. And yes, I did make a mistake with my numbers. Oops. – Omnifarious Apr 03 '11 at 09:05
17

Use SortedDict from the SortedCollection.

A SortedDict provides the same methods as a dict. Additionally, a SortedDict efficiently maintains its keys in sorted order. Consequently, the keys method will return the keys in sorted order, the popitem method will remove the item with the highest key, etc.

I've used it - it works. Unfortunately I don't have the time now to do a proper performance comparison, but subjectively it seems to have become faster than the bisect module.

Milind R
  • 676
  • 1
  • 8
  • 20
  • 6
    Any idea why this isn't in standard Python? It amuses me that Python got hash based dictionaries from the start, and that C++ got tree based ones first. – Omnifarious Feb 03 '17 at 15:23
4

Cheap searching and cheap insertion tend to be at odds. You could use a linked list for the data structure. Then searching to find the insertion point for a new element is O(n), and the subsequent insertion of the new element in the correct location is O(1).

But you're probably better off just using a straightforward Python list. Random access (i.e. finding your spot) takes constant time. Insertion in the correct location to maintain the sort is theoretically more expensive, but that depends on how the dynamic array is implemented. You don't really pay the big price for insertions until reallocation of the underlying array takes place.

Regarding checking for date range overlaps, I happen to have had the same problem in the past. Here's the code I use. I originally found it in a blog post, linked from an SO answer, but that site no longer appears to exist. I actually use datetimes in my ranges, but it will work equally well with your numeric values.

def dt_windows_intersect(dt1start, dt1end, dt2start, dt2end):
    '''Returns true if two ranges intersect. Note that if two
    ranges are adjacent, they do not intersect.

    Code based on:
    http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
    http://stackoverflow.com/questions/143552/comparing-date-ranges  
    '''

    if dt2end <= dt1start or dt2start >= dt1end:
        return False

    return  dt1start <= dt2end and dt1end >= dt2start

Here are the unit tests to prove it works:

from nose.tools import eq_, assert_equal, raises

class test_dt_windows_intersect():
    """
    test_dt_windows_intersect
    Code based on: 
    http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
    http://stackoverflow.com/questions/143552/comparing-date-ranges  

               |-------------------|         compare to this one
    1               |---------|              contained within
    2          |----------|                  contained within, equal start
    3                  |-----------|         contained within, equal end
    4          |-------------------|         contained within, equal start+end
    5     |------------|                     overlaps start but not end
    6                      |-----------|     overlaps end but not start
    7     |------------------------|         overlaps start, but equal end
    8          |-----------------------|     overlaps end, but equal start
    9     |------------------------------|   overlaps entire range

    10 |---|                                 not overlap, less than
    11 |-------|                             not overlap, end equal
    12                              |---|    not overlap, bigger than
    13                             |---|     not overlap, start equal
    """


    def test_contained_within(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,30),   datetime(2009,10,1,6,40),
        )

    def test_contained_within_equal_start(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,0),    datetime(2009,10,1,6,30),
        )

    def test_contained_within_equal_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,30),   datetime(2009,10,1,7,0),
        )

    def test_contained_within_equal_start_and_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
        )

    def test_overlaps_start_but_not_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,30),   datetime(2009,10,1,6,30),
        )

    def test_overlaps_end_but_not_start(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,30),   datetime(2009,10,1,7,30),
        )

    def test_overlaps_start_equal_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,30),   datetime(2009,10,1,7,0),
        )

    def test_equal_start_overlaps_end(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,30),
        )

    def test_overlaps_entire_range(self):
        assert dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,0),    datetime(2009,10,1,8,0),
        )

    def test_not_overlap_less_than(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,0),    datetime(2009,10,1,5,30),
        )

    def test_not_overlap_end_equal(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,5,0),    datetime(2009,10,1,6,0),
        )

    def test_not_overlap_greater_than(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,7,30),    datetime(2009,10,1,8,0),
        )

    def test_not_overlap_start_equal(self):
        assert not dt_windows_intersect(
            datetime(2009,10,1,6,0),    datetime(2009,10,1,7,0),
            datetime(2009,10,1,7,0),    datetime(2009,10,1,8,0),
        )
ire_and_curses
  • 68,372
  • 23
  • 116
  • 141
1

Maybe the module bisect could be better than the simple following function ? :

li = [(0, 100), (150, 220), (250, 400), (500, 1000)]


def verified_insertion(x,L):
    u,v = x
    if v<L[0][0]:
        return [x] + L
    elif u>L[-1][0]:
        return L + [x]
    else:
        for i,(a,b) in enumerate(L[0:-1]):
            if a<u and v<L[i+1][0]:
                return L[0:i+1] + [x] + L[i+1:]
    return L 


lo = verified_insertion((-10,-2),li)

lu = verified_insertion((102,140),li)

le = verified_insertion((222,230),li)

lee = verified_insertion((234,236),le) # <== le

la = verified_insertion((408,450),li)

ly = verified_insertion((2000,3000),li)

for w in (lo,lu,le,lee,la,ly):
    print li,'\n',w,'\n'

The function returns a list without modifying the list passed as argument.

result

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(-10, -2), (0, 100), (150, 220), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (102, 140), (150, 220), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (222, 230), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (222, 230), (234, 236), (250, 400), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (250, 400), (408, 450), (500, 1000)] 

[(0, 100), (150, 220), (250, 400), (500, 1000)] 
[(0, 100), (150, 220), (250, 400), (500, 1000), (2000, 3000)] 
eyquem
  • 26,771
  • 7
  • 38
  • 46
0

To answer your question:

Is there a data structure like that available in Python?

No there is not. But you can easily build one yourself using a list as the basic structure and code from the bisect module to keep the list in order and check for overlaps.

class RangeList(list):
"""Maintain ordered list of non-overlapping ranges"""
    def add(self, range):
    """Add a range if no overlap else reject it"""
        lo = 0; hi = len(self)
        while lo < hi:
            mid = (lo + hi)//2
            if range < self[mid]: hi = mid
            else: lo = mid + 1
        if overlaps(range, self[lo]):
            print("range overlap, not added")
        else:
            self.insert(lo, range)

I leave the overlaps function as an exercise. (This code is untested and may need some tweeking)

Don O'Donnell
  • 4,538
  • 3
  • 26
  • 27
  • 2
    This works, but you've just re-implemented part of the bisect module. :-) – Omnifarious Apr 03 '11 at 09:06
  • @Omnifarious: You are right. My original intent was to copy and extend the bisect function, thinking that it needed a different comparison operator to handle the range compares. Turns out it didn't so I should have just called the bisect function. – Don O'Donnell Apr 09 '11 at 06:22