Partitioning a set of values in Python

Question

I wrote a function that tries to partition a list of values into contiguous chunks. A chunk is a set of values in which values from the start through to the end would be present in the list. As an example, consider the list [1,2,3,7,7,8,9]. This would be partitioned into {1:3, 7:3}. I could later read that as an interval that starts at 1 of length 3 and an interval that starts at 7 of length 3.

I came up with the following code :-

values = list(set(values))
values.sort()
ranges = {}
for value in values:
    if value - i in ranges:
        ranges[value-i] += 1
        i += 1
    else:
        i = 1
        ranges[value] = 0

I'm curious if this is the best way to do it. What would be the time complexity of converting a list to a set and back into a list? I'm guessing that would be O(n).

Do you have any suggestions on how this could be done better?

In theory (but probably not in practice), there's value in replace the sort with multiple uses of hashing. Use [this algorithm for reconstructing an itinerary from one-way tickets](http://stackoverflow.com/a/3077721/2144669), where each distinct number `x` in the input is a ticket from `x` to `x+1`. — David Eisenstat, Feb 22 '15 at 14:40
This looks fine. You can't get any better than O(n) since you need go through each value in the array. — Bartlomiej Lewandowski, Feb 22 '15 at 14:55

score 2 · Accepted Answer · edited May 23 '17 at 11:50

We can do linear:

values = [7, 3, 2, 7, 1, 9, 8]

range_by_min, range_by_max = {}, {}

for v in values:
    range_by_min[v] = range_by_max[v] = [v, v]

for v in values:
    if v - 1 in range_by_max and v in range_by_min:
        p, q = range_by_max[v - 1], range_by_min[v]
        del range_by_min[q[0]]
        del range_by_max[p[1]]
        p[1] = q[1]
        range_by_max[p[1]] = p

print(range_by_min, range_by_max)

result = {k: v[1] - v[0] + 1 for k, v in range_by_min.iteritems()}
print(result)

Result:

({1: [1, 3], 7: [7, 9]}, {3: [1, 3], 9: [7, 9]})
{1: 3, 7: 3}

The idea is to keep two dictionaries that store ranges (a range is represented as a list of it's minimum and maximum value). The first maps the minimum key to the range. The second maps the maximum key to the range.

Then we traverse the list of values and we join neighboring ranges. If we are visiting 4 and there is a range 4..6 then we check if there is a range ending at 3, let's say 1..3. So we join them in one: 1..6.

The algorithm is linear to the hash table accesses. Since we expect constant access to the dictionaries, the expected running time is linear to the size of values. With this way we don't even have to sort the input array.

EDIT:

I saw the link suggested by David Eisenstat. Based on this link, the implementation can be updated to use only one dictionary:

ranges = {v: [v, v] for v in values}

for v in values:
    if v - 1 in ranges and v in ranges:
        p, q = ranges[v - 1], ranges[v]
        if p[1] == v - 1 and q[0] == v:
            if q[0] != q[1]:
                del ranges[q[0]]
            if p[0] != p[1]:
                del ranges[p[1]]
            p[1] = q[1]
            ranges[p[1]] = p

result = {k: v[1] - v[0] + 1 for k, v in ranges.iteritems() if k == v[0]}

Partitioning a set of values in Python

1 Answers1