8

Assume I have a list of IP ranges (last term only) that may or may not overlap:

('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')

I'm looking for a way to identify any overlapping ranges and consolidate them into single ranges.

('1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3')

Current thought for algorithm is to expand all ranges into a list of all IPs, eliminate duplicates, sort, and consolidate any consecutive IPs.

Any more python-esque algorithm suggestions?

Jonathan
  • 3,464
  • 9
  • 46
  • 54
  • Sorting and then consolidating sounds like a pretty good solution to me. O(nlg(n)). Not sure if this can be improved. – Colin D Apr 13 '12 at 17:27
  • @ColinD I'm sure he'd like to avoid going from ranges to lists – agf Apr 13 '12 at 17:40

6 Answers6

7

Here is my version, as a module. My algorithm is identical to the one lunixbochs mentions in his answer, and the conversion from range string to integers and back is nicely modularized.

import socket, struct

def ip2long(ip):
    packed = socket.inet_aton(ip)
    return struct.unpack("!L", packed)[0]

def long2ip(n):
    unpacked = struct.pack('!L', n)
    return socket.inet_ntoa(unpacked)

def expandrange(rng):
    # expand '1.1.1.1-7' to ['1.1.1.1', '1.1.1.7']
    start, end = [ip.split('.') for ip in rng.split('-')]
    return map('.'.join, (start, start[:len(start) - len(end)] + end))

def compressrange((start, end)):
    # compress ['1.1.1.1', '1.1.1.7'] to '1.1.1.1-7'
    start, end = start.split('.'), end.split('.')
    return '-'.join(map('.'.join,
          (start, end[next((i for i in range(4) if start[i] != end[i]), 3):])))

def strings_to_ints(ranges):
    # turn range strings into list of lists of ints
    return [map(ip2long, rng) for rng in map(expandrange, ranges)]

def ints_to_strings(ranges):
    # turn lists of lists of ints into range strings
    return [compressrange(map(long2ip, rng)) for rng in ranges]

def consolodate(ranges):
    # join overlapping ranges in a sorted iterable
    iranges = iter(ranges)
    startmin, startmax = next(iranges)
    for endmin, endmax in iranges:
        # leave out the '+ 1' if you want to join overlapping ranges
        # but not consecutive ranges.
        if endmin <= (startmax + 1):
            startmax = max(startmax, endmax)
        else:
            yield startmin, startmax
            startmin, startmax = endmin, endmax
    yield startmin, startmax

def convert_consolodate(ranges):
    # convert a list of possibly overlapping ip range strings
    # to a sorted, consolodated list of non-overlapping ip range strings
    return list(ints_to_strings(consolodate(sorted(strings_to_ints(ranges)))))

if __name__ == '__main__':
    ranges = ('1.1.1.1-7',
              '2.2.2.2-10',
              '3.3.3.3-3.3.3.3',
              '1.1.1.4-25',
              '2.2.2.4-6')
    print convert_consolodate(ranges)
    # prints ['1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3']
agf
  • 171,228
  • 44
  • 289
  • 238
  • I think that `ranges = ('1.1.1.1-7','1.1.1.8-10')` should be merged to `['1.1.1.1-10']`. Now the output is `['1.1.1.1-7', '1.1.1.8-10']` – ovgolovin Apr 13 '12 at 21:19
  • @ovgolovin I took him literally. Those are "consecutive" ranges, not "overlapping" ranges. – agf Apr 13 '12 at 22:00
  • @ovgolovin Really, you're right, he almost certainly wants to join consecutive ranges. I've changed my code to work on integers rather than lists-of-integers so it can easily handle consecutive ranges. – agf Apr 13 '12 at 22:38
  • Only one thing that bothers me (in my code too) that using integers to represent IPs would lead to merging consecutive but probably unwanted ranges (e.g. `('10.10.255.1-255','10.11.0.0-50')` to `['10.10.255.1-10.11.0.50']`. I'm not sure if it's what he wants. I think these ranges are for human to ease maintenance, and `['10.10.255.1-10.11.0.50']` is a bit counterintuitive (at least for me it would be better to have it as two ranges `('10.10.255.1-255','10.11.0.0-50')`) – ovgolovin Apr 13 '12 at 22:49
  • @ovgolovin It's true. He mentions that only the last part of the ip will be represented as a range (though my code is generalized to handle multipart ranges), so he probably doesn't want that -- but it's easy enough to split those ranges into parts if he needs to. That's the kind of corner case we don't have enough info to judge what is better, and the code is cleaner leaving that in :) – agf Apr 13 '12 at 22:56
  • I'm trying to utilize this code snippet in Python 3, however I'm running into an issue with the `consolidate(ranges)` function, I get `ValueError: too many values to unpack (expected 2)` when it gets to `startmin, startmax = next(iranges)`. I was wondering if this use of `next(iter())` was deprecated and I need to use something new. – Pyrometheous Jun 15 '21 at 21:51
  • 1
    @Pyrometheous the return type of `map` changed in Python 3. If you wrap all the calls to `map` with calls to `list` it should work. Sorry, don't have time to test / update it right now. – agf Jun 17 '21 at 04:32
1

Once I faced the same problem. The only difference was that I had to efficiently keep line segments in a list. It was for a Monte-Carlo simulation. And the newly randomly generated line segments had to be added to the existing sorted and merged line segments.

I adapted the algorithm to your problem using the answer by lunixbochs to convert IPs to integers.

This solution allows to add a new IP range to the existing list of already merged ranges (while other solutions rely on having the list-of-ranges-to-merge sorted and do not allow adding a new range to already merged range list). It's done in add_range function by using bisect module to find the place where to insert the new IP range and then deleting the redundant IP intervals and inserting the new range with adjusted boundaries so that the new range embraces all the deleted ranges.

import socket
import struct
import bisect


def ip2long(ip):
    '''IP to integer'''
    packed = socket.inet_aton(ip)
    return struct.unpack("!L", packed)[0]


def long2ip(n):
    '''integer to IP'''
    unpacked = struct.pack('!L', n)
    return socket.inet_ntoa(unpacked)


def get_ips(s):
    '''Convert string IP interval to tuple with integer representations of boundary IPs
'1.1.1.1-7' -> (a,b)'''
    s1,s2 = s.split('-')
    if s2.isdigit():
        s2 = s1[:-1] + s2
    return (ip2long(s1),ip2long(s2))


def add_range(iv,R):
    '''add new Range to already merged ranges inplace'''
    left,right = get_ips(R)
    #left,right are left and right boundaries of the Range respectively

    #If this is the very first Range just add it to the list
    if not iv:
        iv.append((left,right))
        return

    #Searching the first interval with left_boundary < left range side
    p = bisect.bisect_right(iv, (left,right)) #place after the needed interval
    p -= 1 #calculating the number of interval basing on the position where the insertion is needed


    #Interval: |----X----| (delete)    
    #Range:   <--<--|----------| (extend)
    #Detect if the left Range side is inside the found interval
    if p >=0: #if p==-1 then there was no interval found
        if iv[p][1]>= right:
            #Detect if the Range is completely inside the interval
            return #drop the Range; I think it will be a very common case

        if iv[p][1] >= left-1:
            left = iv[p][0] #extending the left Range interval
            del iv[p] #deleting the interval from the interval list
            p -= 1 #correcting index to keep the invariant


    #Intervals:   |----X----| |---X---| (delete)    
    #Range:  |-----------------------------|        
    #Deleting all the intervals which are inside the Range interval
    while True:
        p += 1
        if p >= len(iv) or iv[p][0] >= right or iv[p][1] > right:
            'Stopping searching for the intervals which is inside the Range interval'
            #there are no more intervals or
            #the interval is to the right of the right Range side
            # it's the next case (right Range side is inside the interval)
            break 
        del iv[p] #delete the now redundant interval from the interval list
        p -= 1 #correcting index to keep the invariant


    #Interval: |--------X--------| (delete)    
    #Range: |-----------|-->--> (extend)    
    #Working the case when the right Range side is inside the interval
    if p < len(iv) and iv[p][0] <= right-1:
        #there is no condition for right interval side since
        #this case would have already been worked in the previous block
        right = iv[p][1] #extending the right Range side
        del iv[p] #delete the now redundant interval from the interval list
        #No p -= 1, so that p is no pointing to the beginning of the next interval
        #which is the position of insertion


    #Inserting the new interval to the list
    iv.insert(p, (left,right))


def merge_ranges(ranges):
    '''Merge the ranges'''
    iv = []
    for R in ranges:
        add_range(iv,R)
    return ['-'.join((long2ip(left),long2ip(right))) for left,right in iv]

ranges = ('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6')
print(merge_ranges(ranges))

Output:

['1.1.1.1-1.1.1.25', '2.2.2.2-2.2.2.10', '3.3.3.3-3.3.3.3']

This was a lot of fun for me to code! Thank you for that :)

ovgolovin
  • 13,063
  • 6
  • 47
  • 78
  • I wonder if it's possible to rewrite this algorithm that it smoothly extends the existing ranges in the list (they would need to be `list`s not `tuple`s) instead of deleting the existing ranges and adding the new all-embracing one at the end with `iv.insert(p, (left,right))`. I just used [`blist`](http://stutzbachenterprises.com/blist/blist.html#blist) to avoid O(n) on deletions and insertions in traditional `list`. But I wonder if somebody could come up with a good algorithmic solution that would avoid *redundant* deletions and insertions. – ovgolovin Apr 13 '12 at 21:04
  • I don't think you can avoid inserts / deletions without pre-sorted data, which would negate the need for your method. It looks like your algorithm is O(nlogn), assuming your inertions and deletions are no worse than O(logn) which is the same as the algorithm lunixbochs mentioned and I use -- except that yours does each O(logn) step manually, while ours depends on a sort and then merges the ranges in linear time. Probably, our version will be faster in CPython because the sort is done in C. – agf Apr 13 '12 at 22:51
  • @agf This algorithm is just an adaptation from the old algorithm made by me to keep the line segments randomly generated. Each step I generated a few new line segments and had to add them to the existing ones, so I needed an effective in-line merging algorithm. – ovgolovin Apr 13 '12 at 22:56
  • 1
    @agf The data is always sorted in the `iv` list. So it's possible not to delete intervals and add a new one, but to extend the existing interval thus avoiding deletions and insertions. But it's very difficult to consider all the cases when the interval for extensions may be taken from the different parts of the algorithm. I think there would be a lot of corner cases and a lot of `if-else`. And I don't like personally such a way of implementing it. :) – ovgolovin Apr 13 '12 at 23:00
  • 1
    @agf I've come up with the solution of extending the existing interval if possible without redundant deletions and insertions (if there 3 intervals to be raplaced with one, 2 still need to be deleted, but one may be extended). I didn't give it a decent testing, so I'll leave this solution [here](http://ideone.com/ME8aU). It turned out to be much simple than I expected. The only downside is that in take a bit more memory since `tuple` as I know is more memory efficient than list. But it's impossible to use `tuple` since I need in-place assignments. – ovgolovin Apr 13 '12 at 23:28
  • *It looks like your algorithm is O(nlogn)* No, its worst case is Theta(n^2), since `.insert()` has to shift all of the elements to the right of the insertion point. – oldboy Apr 14 '12 at 15:11
  • Or, if you actually used `blist`, Theta(nlog^2n). – oldboy Apr 14 '12 at 15:14
  • @oldboy Accroding to the [docs](http://stutzbachenterprises.com/blist/blist.html#blist), insertion is O(log(n)), so n insertions would have O(n*log(n)). Also, in the [last version](http://ideone.com/ME8aU) of the algorithm I tried to avoid redundant insertions and deletions. – ovgolovin Apr 14 '12 at 15:17
  • It's the bisection that gets you with the blist version. – oldboy Apr 14 '12 at 15:19
  • would Interval tree simplify the code? [Removing overlapping lists](https://stackoverflow.com/a/16314541/4279) – jfs May 13 '20 at 13:08
1

Convert your ranges into pairs of numbers. These functions will convert individual IPs to and from integer values.

def ip2long(ip):
    packed = socket.inet_aton(ip)
    return struct.unpack("!L", packed)[0]

def long2ip(n):
    unpacked = struct.pack('!L', n)
    return socket.inet_ntoa(unpacked)

Now you can sort/merge the edges of each range as numbers, then convert back to IPs to get a nice representation. This question about merging time ranges has a nice algorithm.


  1. Parse your strings of 1.1.1.1-1.1.1.2 and 1.1.1.1-2 into a pair of numbers. For the latter format, you could do:

    x = '1.1.1.1-2'
    first, add = x.split('-')
    second = first.rsplit('.', 1)[0] + '.' + add
    pair = ip2long(first), ip2long(second)
    
  2. Merge the overlapping ranges using simple number comparisons.

  3. Convert back to string representation (still assumes latter format):

    first, second = pair
    first = long2ip(first) + '-' + long2ip(second).rsplit('.', 1)[1]
    
Community
  • 1
  • 1
lunixbochs
  • 21,757
  • 2
  • 39
  • 47
0

Unify format of your ips, turn range into a pair of ints.

Now the task is much simpler - "consolidate" integer range. I believe there are a lot of existing efficient algorithm to do that, below only my naive try:

>>> orig_ranges = [(1,5), (7,12), (2,3), (13,13), (13,17)] # should result in (1,5), (7,12), (13,17)
>>> temp_ranges = {}    
>>> for r in orig_ranges:
        temp_ranges.setdefault(r[0], []).append('+')
        temp_ranges.setdefault(r[1], []).append('-')

>>> start_count = end_count = 0
>>> start = None
>>> for key in temp_ranges:
        if start is None:
            start = key
        start_count += temp_ranges[key].count('+')
        end_count += temp_ranges[key].count('-')
        if start_count == end_count:
            print start, key
            start = None
            start_count = end_count = 0

1 5
7 12
13 17

The general idea is the next: after we put ranges one onto another (in temp_ranges dict), we may find new composed ranges simply by counting beginnings and endings of original ranges; once we got equality, we found a united range.

Roman Bodnarchuk
  • 29,461
  • 12
  • 59
  • 75
0

I had these lying around in case you need em, using socket/struct is probably better way to go though

def ip_str_to_int(address):
    """Convert IP address in form X.X.X.X to an int.

    >>> ip_str_to_int('74.125.229.64')
    1249764672

    """
    parts = address.split('.')
    parts.reverse()
    return sum(int(v) * 256 ** i for i, v in enumerate(parts))


def ip_int_to_str(address):
    """Convert IP address int into the form X.X.X.X.

    >>> ip_int_to_str(1249764672)
    '74.125.229.64'

    """
    parts = [(address & 255 << 8 * i) >> 8 * i for i in range(4)]
    parts.reverse()
    return '.'.join(str(x) for x in parts)
Jason Keene
  • 1,085
  • 3
  • 10
  • 20
  • This doesn't answer the question, there are already several versions of this in the other answers, and it's also not necessary for solving the problem. If you want to post this as an answer, there is a question where it would be appropriate: [Conversion from IP string to integer, and backward in Python](http://stackoverflow.com/questions/5619685/conversion-from-ip-string-to-integer-and-backward-in-python). – agf Apr 13 '12 at 20:04