8

Python has a range method, which allows for stuff like:

>>> range(1, 6)
[1, 2, 3, 4, 5]

What I’m looking for is kind of the opposite: take a list of numbers, and return the start and end.

>>> magic([1, 2, 3, 4, 5])
[1, 5] # note: 5, not 6; this differs from `range()`

This is easy enough to do for the above example, but is it possible to allow for gaps or multiple ranges as well, returning the range in a PCRE-like string format? Something like this:

>>> magic([1, 2, 4, 5])
['1-2', '4-5']
>>> magic([1, 2, 3, 4, 5])
['1-5']

Edit: I’m looking for a Python solution, but I welcome working examples in other languages as well. It’s more about figuring out an elegant, efficient algorithm. Bonus question: is there any programming language that has a built-in method for this?

Mathias Bynens
  • 144,855
  • 52
  • 216
  • 248
  • I have a suspicion that there's no better way than iterating through the list, which is easy to write on your own. – trutheality Feb 27 '12 at 19:02
  • @trutheality Me too, hence this question. I’m hoping there’s an _elegant_ solution that I’m missing here. Fingers crossed! – Mathias Bynens Feb 27 '12 at 19:05

6 Answers6

14

A nice trick to simplify the code is to look at the difference of each element of the sorted list and its index:

a = [4, 2, 1, 5]
a.sort()
print [x - i for i, x in enumerate(a)]

prints

[1, 1, 2, 2]

Each run of the same number corresponds to a run of consecutive numbers in a. We can now use itertools.groupby() to extract these runs. Here's the complete code:

from itertools import groupby

def sub(x):
    return x[1] - x[0]

a = [5, 3, 7, 4, 1, 2, 9, 10]
ranges = []
for k, iterable in groupby(enumerate(sorted(a)), sub):
     rng = list(iterable)
     if len(rng) == 1:
         s = str(rng[0][1])
     else:
         s = "%s-%s" % (rng[0][1], rng[-1][1])
     ranges.append(s)
print ranges

printing

['1-5', '7', '9-10']
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    Nice! One (stupid) "problem" is the `list` call, which means you end up copying all the elements of the list in chunks unnecessarily, which doesn't matter at all in 99% of cases but can be avoided by using something like the `rangestr` function from my answer. – Danica Feb 27 '12 at 23:57
5

Sort numbers, find consecutive ranges (remember RLE compression?).

Something like this:

input = [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50]

output = []
first = last = None # first and last number of current consecutive range
for item in sorted(input):
  if first is None:
    first = last = item # bootstrap
  elif item == last + 1: # consecutive
    last = item # extend the range
  else: # not consecutive
    output.append((first, last)) # pack up the range
    first = last = item
# the last range ended by iteration end
output.append((first, last))

print output

Result: [(1, 3), (5, 9), (20, 23), (50, 50)]. You figure out the rest :)

9000
  • 39,899
  • 9
  • 66
  • 104
4

I thought you might like my generalised clojure solution.

(def r [1 2 3 9 10])

(defn successive? [a b]
  (= a (dec b)))

(defn break-on [pred s]
  (reduce (fn [memo n]
            (if (empty? memo)
              [[n]]
              (if (pred (last (last memo)) n)
                (conj (vec (butlast memo))
                      (conj (last memo) n))
                (conj memo [n]))))
          []
          s))

(break-on successive? r)
gf3
  • 481
  • 2
  • 11
2

This is kind of elegant but also kind of disgusting, depending on your point of view. :)

import itertools

def rangestr(iterable):
    end = start = iterable.next()
    for end in iterable:
        pass
    return "%s" % start if start == end else "%s-%s" % (start, end)

class Rememberer(object):
    last = None

class RangeFinder(object):
    def __init__(self):
        self.o = Rememberer()

    def __call__(self, x):
        if self.o.last is not None and x != self.o.last + 1:
            self.o = Rememberer()
        self.o.last = x
        return self.o

def magic(iterable):
    return [rangestr(vals) for k, vals in
            itertools.groupby(sorted(iterable), RangeFinder())]


>>> magic([5,7,9,8,6, 21,20, 3,2,1, 22,23, 50])
['1-3', '5-9', '20-23', '50']

Explanation: it uses itertools.groupby to group the sorted elements together by a key, where the key is a Rememberer object. The RangeFinder class keeps a Rememberer object as long as a consecutive bunch of items belongs to the same range block. Once you've passed out of a given block, it replaces the Rememberer so that the key won't compare equal and groupby will make a new group. As groupby walks over the sorted list, it passes the elements one-by-one into rangestr, which constructs the string by remembering the first and the last element and ignoring everything in between.

Is there any practical reason to use this instead of 9000's answer? Probably not; it's basically the same algorithm.

Community
  • 1
  • 1
Danica
  • 28,423
  • 6
  • 90
  • 122
  • I also considered something along these lines, using pure iteration to compute a catamorphism :) But I could not invent an FP-esque solution which is not long and would not require pattern matching. Alas, this is missing from Python. – 9000 Feb 27 '12 at 21:38
  • 1
    The try/except block in `rangestr()` can be replaced by `for end in iterable: pass`. Also note that `iterable.next()` should be written as `next(iterable)` starting in Python 2.6. – Sven Marnach Feb 28 '12 at 00:04
  • Good point, the for loop is nicer -- I switched to that. And I do know about next() but didn't use it because there's still a lot of 2.5 users out there. – Danica Feb 28 '12 at 03:53
2

Since 9000 beat me to it, I'll just post the second part of the code, that prints pcre-like ranges from the previously computed output plus the added type check:

for i in output:
    if not isinstance(i, int) or i < 0:
        raise Exception("Only positive ints accepted in pcre_ranges")
result = [ str(x[0]) if x[0] == x[1] else '%s-%s' % (x[0], x[1]) for x in output ]
print result

Output: ['1-3', '5-9', '20-23', '50']

Frg
  • 406
  • 2
  • 3
2

Let's try generators!

# ignore duplicate values
l = sorted( set( [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50] ) )

# get the value differences 
d = (i2-i1 for i1,i2 in zip(l,l[1:]))

# get the gap indices
gaps = (i for i,e in enumerate(d) if e != 1)

# get the range boundaries
def get_ranges(gaps, l):
  last_idx = -1
  for i in gaps:
    yield (last_idx+1, i)
    last_idx = i
  yield (last_idx+1,len(l)-1)

# make a list of strings in the requested format (thanks Frg!)
ranges = [ "%s-%s" % (l[i1],l[i2]) if i1!=i2 else str(l[i1]) \
  for i1,i2 in get_ranges(gaps, l) ]

This has become rather scary, I think :)

moooeeeep
  • 31,622
  • 22
  • 98
  • 187