8

I'd like to know if there is a simple (or already created) way of doing the opposite of this: Generate List of Numbers from Hyphenated.... This link could be used to do:

>> list(hyphen_range('1-9,12,15-20,23'))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 16, 17, 18, 19, 20, 23]:

I'm looking to do the opposite (note that 10 and 21 are included so it would be compatible with the range function, where range(1,10)=[1,2,3,4,5,6,7,8,9]):

>> list_to_ranges([1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 16, 17, 18, 19, 20, 23])
'1-10,12,15-21,23'

Eventually, I would like to have the output also incorporate a step where the last number of the output indicates the step:

>> list_to_ranges([1, 3, 5, 7, 8, 9, 10, 11])
'1-13:2,8,10'

Essentially, this would end up being kind of like an "inverse" range function

>> tmp = list_to_ranges([1, 3, 5])
>> print tmp
'1-7:2'
>> range(1, 7, 2)
[1, 3, 5]

My guess is that there is no really easy/simple way to do this, but I thought I would ask on here before I go make some brute force, long method.

EDIT

Using the code from an answer to this post as an example, I came up with a simple way to do the first part. But I think that identifying the patterns to do steps would be a bit harder.

from itertools import groupby
from operator import itemgetter

data = [ 1,  4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
print data, '\n'

str_list = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
   ilist = map(itemgetter(1), g)
   print ilist
   if len(ilist) > 1:
      str_list.append('%d-%d' % (ilist[0], ilist[-1]+1))
   else:
      str_list.append('%d' % ilist[0])
print '\n', ','.join(str_list)

EDIT 2

Here is my attempt at including the step size...it is pretty close, but the first numbers get repeated. I think that with a little bit of tweaking of this, it will be close to what I want - or at least good enough.

import numpy as np
from itertools import groupby

def list_to_ranges(data):
   data = sorted(data)
   diff_data = np.diff(data).tolist()
   ranges = []
   i = 0
   for k, iterable in groupby(diff_data, None):
      rng = list(iterable)
      step = rng[0]
      if len(rng) == 1:
         ranges.append('%d' % data[i])
      elif step == 1:
         ranges.append('%d-%d' % (data[i], data[i+len(rng)]+step))
      else:
         ranges.append('%d-%d:%d' % (data[i], data[i+len(rng)]+step, step))
      i += len(rng)
   return ','.join(ranges)

data = [1, 3, 5, 6, 7, 11, 13, 15, 16, 17, 18, 19, 22, 25, 28]
print data
data_str = list_to_ranges(data)
print data_str

_list = []
for r in data_str.replace('-',':').split(','):
   r = [int(a) for a in r.split(':')]
   if len(r) == 1:
      _list.extend(r)
   elif len(r) == 2:
      _list.extend(range(r[0], r[1]))
   else:
      _list.extend(range(r[0], r[1], r[2]))
print _list
print list(set(_list))
Community
  • 1
  • 1
Scott B
  • 2,542
  • 7
  • 30
  • 44
  • 1
    The brute force method you talk about, doesn't necessarily have to be that long at all.. – Milo Wielondek Mar 23 '12 at 23:12
  • I agree. You're going to have to parse the list in order to indentify patterns, especially if you want to add non-unity step recognition as well. – Joel Cornett Mar 23 '12 at 23:37
  • 3
    There is some ambiguity here: `1-13:2,8,10` is the same as `1-7:2,7-11`. Before we can really look at an algorithm, you'll have to give a more precise definition of what you want. – Winston Ewert Mar 23 '12 at 23:39
  • @Winston Ewert: agreed. This is something I thought about...either is a valid output. I don't really care which result occurs, as long as they are equivalent. – Scott B Mar 23 '12 at 23:45
  • 1
    Ok, but then `1,3,5,7,8,9,10,11` would also be equivalent as would `1,3,5,7-11`. Surely you've got some sort of requirement beyond equivalence. – Winston Ewert Mar 23 '12 at 23:50
  • @Winston Ewert: I guess my "idea" criteria would be that I want the list to be as short as possible...this becomes quite difficult when data becomes mixed. For example: 1,3,5,7,8,9,10,11,13,15 would ideally become 1-17:2,8,10. In my attempts thus far though, I just work left to right and stop when the current pattern/streak is broken (see my "Edit 2"). – Scott B Mar 24 '12 at 00:19

5 Answers5

6

One approach could be "eating" piece by piece the input sequence and store the partial range results untill you've got them all:

def formatter(start, end, step):
    return '{}-{}:{}'.format(start, end, step)
    # return '{}-{}:{}'.format(start, end + step, step)

def helper(lst):
    if len(lst) == 1:
        return str(lst[0]), []
    if len(lst) == 2:
        return ','.join(map(str,lst)), []

    step = lst[1] - lst[0]
    for i,x,y in zip(itertools.count(1), lst[1:], lst[2:]):
        if y-x != step:
            if i > 1:
                return formatter(lst[0], lst[i], step), lst[i+1:]
            else:
                return str(lst[0]), lst[1:]
    return formatter(lst[0], lst[-1], step), []

def re_range(lst):
    result = []
    while lst:
        partial,lst = helper(lst)
        result.append(partial)
    return ','.join(result)

I test it with a bunch of unit tests and it passed them all, it can handle negative numbers too, but they'll look kind of ugly (it's really anybody's fault).

Example:

>>> re_range([1,  4,5,6, 10, 15,16,17,18, 22, 25,26,27,28])
'1,4-6:1,10,15-18:1,22,25-28:1'
>>> re_range([1, 3, 5, 7, 8, 9, 10, 11, 13, 15, 17])
'1-7:2,8-11:1,13-17:2'

Note: I wrote the code for Python 3.


Performance

I didn't put any performance effort in the solution above. In particular, every time a list get re-builded with slicing, it might take some time if the input list has a particular shape. So, the first simple improvement would be using itertools.islice() where possible.

Anyway here's another implementation of the same algorithm, that scan through the input list with a scan index instead of slicing:

def re_range(lst):
    n = len(lst)
    result = []
    scan = 0
    while n - scan > 2:
        step = lst[scan + 1] - lst[scan]
        if lst[scan + 2] - lst[scan + 1] != step:
            result.append(str(lst[scan]))
            scan += 1
            continue

        for j in range(scan+2, n-1):
            if lst[j+1] - lst[j] != step:
                result.append(formatter(lst[scan], lst[j], step))
                scan = j+1
                break
        else:
            result.append(formatter(lst[scan], lst[-1], step))
            return ','.join(result)

    if n - scan == 1:
        result.append(str(lst[scan]))
    elif n - scan == 2:
        result.append(','.join(map(str, lst[scan:])))

    return ','.join(result)

I stopped working on it once it got ~65% faster than the previous top solution, it seemed enough :)

Anyway I'd say that there might still be room for improvement (expecially in the middle for-loop).

Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
  • Works great with my tests too, thanks. The only thing I changed was to add step to the middle term from the helper function (range doesn't include the second term, i.e.: range(1,6) = [1,2,3,4,5] -> it doesn't include 6). So for your first example, instead of 4-6:1, it should be 4-7:1 instead. So use ....format(lst[0], lst[i]+step, step), etc... – Scott B Mar 26 '12 at 15:55
  • @ScottB: I didn't noticed that you wanted that kind of behaviour. I'd say that it's a matter of taste, for example I don't like an hyphen string to show the last number and it should be up to `hypend_range` to fix that to behave like range (just [an example](https://gist.github.com/e0f4b3cc3430e6d99112)). I'll add your fix as a custom `formatter` :) – Rik Poggi Mar 26 '12 at 17:06
  • @ScottB: I'd say that is slow because there's the recursive and repeted cutting of the input list. Is there a real performance problem or you're just benchmarking? A better way would be doing the same thing, but just parsing the string without the recursion and the cutting, I'll think about the code... (In the meantime if you want to share your benchmarking enviroment I'll be happy to work with that instead of having to build a new one). – Rik Poggi Mar 26 '12 at 17:15
  • Ya, I was just doing some benchmarking to see how the two solutions scaled. Since your one does all the recursion, I figured it would get slow when the list of numbers is large. I actually have a modified version of mine (from "EDIT 2") that seems to consistently be the quickest one. One reason I liked your solution was that it generally seems to give shorter results than Kaidence's, but my modified method gives the same result as yours, but does it very quickly, even for very large amounts of data. I'll post my comparison as a new answer shortly... – Scott B Mar 26 '12 at 18:21
  • I posted my comparison of each method. – Scott B Mar 26 '12 at 18:32
  • @ScottB: I've updated the answer with a more performant implementation of the same algorithm. I've timed it with your benchmark and it seems to be pretty fast :) – Rik Poggi Mar 26 '12 at 21:16
  • What is formatter in python 2.7, do you know? I can't seem to find it when I search...I found a formatter module, but I don't think it is the same thing as what you used... – Scott B Mar 26 '12 at 21:52
2

This is similar to versions that handle the step-size-of-one case enumerated here but also handles the singleton (elements with no more than 2 elements in a sequence or repeated elements) and non-unitary step sizes (including negative step sizes). It also does not drop duplicates for lists like [1, 2, 3, 3, 4, 5].

As for runtime: it's done before you blink.

def ranges(L):
    """return a list of singletons or ranges of integers, (first, last, step)
    as they occur sequentially in the list of integers, L.

    Examples
    ========

    >>> list(ranges([1, 2, 4, 6, 7, 8, 10, 12, 13]))
    [1, (2, 6, 2), 7, (8, 12, 2), 13]
    >>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
    [(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]

    """
    if not L:
        return []
    r = []
    for i in L:
        if len(r) < 2:
            r.append(i)
            if len(r) == 2:
                d = r[1] - r[0]
        else:
            if i - r[1] == d:
                r[1] = i
            else:
                if r[1] - r[0] == d:
                    yield(r.pop(0))
                    r.append(i)
                    d = r[1] - r[0]
                else:
                    yield(tuple(r+[d]))
                    r[:] = [i]
    if len(r) == 1:
        yield(r.pop())
    elif r[1] - r[0] == d:
        for i in r:
            yield i
    else:
        yield(tuple(r+[d]))

The raw output can be modified as desired, e.g. actual range instances can be created.

def sranges(i):
    """return pretty string for output of ranges.

    Examples
    ========

    >>> sranges([1,2,4,6,7,8,10,12,13,15,16,17])
    '1, range(2, 8, 2), 7, range(8, 14, 2), 13, range(15, 18)'

    """
    out = []
    for i in ranges(i):
        if type(i) is int:
            out.append(str(i))
        elif i[-1] == 1:
            if i[0] == 0:
                out.append('range(%s)'%(i[1] + 1))
            else:
                out.append('range(%s, %s)'%(i[0], i[1] + 1))
        else:
            out.append('range(%s, %s, %s)'%(i[0], i[1] + i[2], i[2]))
    return ', '.join(out)
smichr
  • 16,948
  • 2
  • 27
  • 34
2

This is most likely what you are looking for.

Edit: I see you already found the post. My apologies.

To help with the second part, I've tinkered a bit myself. This is what I came up with:

from numpy import diff

data = [ 1, 3, 5, 7, 8, 9, 10, 11, 13, 15, 17 ]
onediff, twodiff = diff(data), diff(diff(data))
increments, breakingindices = [], []
for i in range(len(twodiff)):
    if twodiff[i] != 0:
        breakingindices.append(i+2) # Correct index because of the two diffs
        increments.append(onediff[i]) # Record the increment for this section

# Increments and breakingindices should be the same size
str_list = []
start = data[0]
for i in range(len(breakingindices)):
    str_list.append("%d-%d:%d" % (start, data[breakingindices[i]-1], increments[i]))
    start = data[breakingindices[i]]
str_list.append("%d-%d:%d" % (start, data[len(data)-1], onediff[len(onediff)-1]))
print str_list

For the given input list, this gives: ['1-7:2', '8-11:1', '13-17:2']. The code could do with a bit of cleanup, but this sorts with your problem assuming the grouping can be done sequentially.

{caution: for [1,2,3,5,6,7] this gives ['1-3:1', '5-5:2', '6-7:1'] instead of ['1-3:1', '5-7:1']}

smichr
  • 16,948
  • 2
  • 27
  • 34
Jos Kraaijeveld
  • 229
  • 2
  • 11
  • Yes, that is great...incidentally it is basically the same thing I just found as well (see my edit). Do you know of any easy way to make it use steps? – Scott B Mar 23 '12 at 23:25
  • Hrm, maybe it would be worth checking out [numpy's diff](http://docs.scipy.org/doc/numpy/reference/generated/numpy.diff.html) function. While the number in the diff'd list is the same, we can group them together. (diff([1,3,5] would return [2,2]). – Jos Kraaijeveld Mar 23 '12 at 23:36
  • Don't post just a link as an answer. Either provide the information from the link in your post, or make it a comment. – agf Mar 23 '12 at 23:37
  • @Kaidence: That was also my thought. I've got a decent start, including the use of numpy's diff, in my "Edit 2" to my post. – Scott B Mar 24 '12 at 00:20
  • @ScottB, I just updated my answer post with more clarification. – Jos Kraaijeveld Mar 24 '12 at 00:31
  • Thanks Kaidence. This does seem to work, but I accepted Rik's answer because his solution was a bit quicker in my timed tests. Thanks for the answer though, and clever way of doing it! The results were correct in my tests (but, like Rik's, I needed to add increment to the middle term returned, because range doesn't include the second term, i.e.: range(1,6) = [1,2,3,4,5] -> it doesn't include 6). – Scott B Mar 26 '12 at 15:57
  • Actually, your solution becomes quicker as the list of numbers to sort increases (especially if it becomes very large, then it is much, much, much quicker...). – Scott B Mar 26 '12 at 16:23
  • Great to hear :) Good luck with it! – Jos Kraaijeveld Mar 26 '12 at 17:13
  • 1
    It seems that I get the best results with a modified method of my "EDIT 2" post. I made a new answer to compare each of the three solutions (mine, yours, and Rik's). – Scott B Mar 26 '12 at 18:33
2

This is a comparison of the 3 methods. Change the amount of data and the density via the values below...no matter what values I use, the first solution seems to be the quickest for me. For very large sets of data, the third solution becomes very slow.

EDITED

Edited to include comments below and add in a new solution. The last solution seems to be the quickest now.

import numpy as np
import itertools
import random
import timeit

# --- My Solution --------------------------------------------------------------
def list_to_ranges1(data):
   data = sorted(data)
   diff_data = np.diff(data)
   ranges = []
   i = 0
   skip_next = False
   for k, iterable in itertools.groupby(diff_data, None):
      rng = list(iterable)
      step = rng[0]
      if skip_next:
         skip_next = False
         rng.pop()

      if len(rng) == 0:
         continue
      elif len(rng) == 1:
         ranges.append('%d' % data[i])
      elif step == 1:
         ranges.append('%d-%d' % (data[i], data[i+len(rng)]+step))
         i += 1
         skip_next = True
      else:
         ranges.append('%d-%d:%d' % (data[i], data[i+len(rng)]+step, step))
         i += 1
         skip_next = True
      i += len(rng)

   if len(rng) == 0 or len(rng) == 1:
      ranges.append('%d' % data[i])
   return ','.join(ranges)

# --- Kaidence Solution --------------------------------------------------------
# With a minor edit for use in range function
def list_to_ranges2(data):
   onediff = np.diff(data)
   twodiff = np.diff(onediff)
   increments, breakingindices = [], []
   for i in range(len(twodiff)):
       if twodiff[i] != 0:
           breakingindices.append(i+2)  # Correct index because of the two diffs
           increments.append(onediff[i]) # Record the increment for this section

  # Increments and breakingindices should be the same size
   str_list = []
   start = data[0]
   for i in range(len(breakingindices)):
       str_list.append("%d-%d:%d" % (start,
                                     data[breakingindices[i]-1] + increments[i],
                                     increments[i]))
       start = data[breakingindices[i]]
   str_list.append("%d-%d:%d" % (start,
                                 data[len(data)-1] + onediff[len(onediff)-1],
                                 onediff[len(onediff)-1]))
   return ','.join(str_list)

# --- Rik Poggi Solution -------------------------------------------------------
# With a minor edit for use in range function
def helper(lst):
    if len(lst) == 1:
        return str(lst[0]), []
    if len(lst) == 2:
        return ','.join(map(str,lst)), []

    step = lst[1] - lst[0]
    #for i,x,y in itertools.izip(itertools.count(1), lst[1:], lst[2:]):
    for i,x,y in itertools.izip(itertools.count(1),
                                itertools.islice(lst, 1, None, 1),
                                itertools.islice(lst, 2, None, 1)):
        if y-x != step:
            if i > 1:
                return '{}-{}:{}'.format(lst[0], lst[i]+step, step), lst[i+1:]
            else:
                return str(lst[0]), lst[1:]
    return '{}-{}:{}'.format(lst[0], lst[-1]+step, step), []

def list_to_ranges3(lst):
    result = []
    while lst:
        partial,lst = helper(lst)
        result.append(partial)
    return ','.join(result)

# --- Rik Poggi Solution 2 -----------------------------------------------------
def formatter(start, end, step):
    #return '{}-{}:{}'.format(start, end, step)
    return '{}-{}:{}'.format(start, end + step, step)

def list_to_ranges4(lst):
    n = len(lst)
    result = []
    scan = 0
    while n - scan > 2:
        step = lst[scan + 1] - lst[scan]
        if lst[scan + 2] - lst[scan + 1] != step:
            result.append(str(lst[scan]))
            scan += 1
            continue

        for j in xrange(scan+2, n-1):
            if lst[j+1] - lst[j] != step:
                result.append(formatter(lst[scan], lst[j], step))
                scan = j+1
                break
        else:
            result.append(formatter(lst[scan], lst[-1], step))
            return ','.join(result)

    if n - scan == 1:
        result.append(str(lst[scan]))
    elif n - scan == 2:
        result.append(','.join(itertools.imap(str, lst[scan:])))

    return ','.join(result)

# --- Test Function ------------------------------------------------------------
def test_data(data, f_to_test):
   data_str = f_to_test(data)
   _list = []
   for r in data_str.replace('-',':').split(','):
      r = [int(a) for a in r.split(':')]
      if len(r) == 1:
         _list.extend(r)
      elif len(r) == 2:
         _list.extend(range(r[0], r[1]))
      else:
         _list.extend(range(r[0], r[1], r[2]))
   return _list

# --- Timing Tests -------------------------------------------------------------
# Generate some sample data...
data_list = []
for i in range(5):
   # Note: using the "4000" and "5000" values below, the relative density of
   # the data can be changed.  This has a huge effect on the results
   # (particularly on the results for list_to_ranges3 which uses recursion).
   data_list.append(sorted(list(set([random.randint(1,4000) for a in \
                                      range(random.randint(5,5000))]))))

testfuncs = list_to_ranges1, list_to_ranges2, list_to_ranges3, list_to_ranges4
for f in testfuncs:
   print '\n', f.__name__
   for i, data in enumerate(data_list):
      t = timeit.Timer('f(data)', 'from __main__ import data, f')
      #print f(data)
      print i, data==test_data(data, f), round(t.timeit(200), 3)
Scott B
  • 2,542
  • 7
  • 30
  • 44
  • As I told in my answer, I wrote the code for Python 3, which means that if you want to run it under Python 2 you'll have to use [`izip()`](http://docs.python.org/library/itertools.html#itertools.izip), [`imap()`](http://docs.python.org/library/itertools.html#itertools.imap) and [`xrange()`](http://docs.python.org/library/functions.html#xrange). Or you could just put a couple of brackets around your print and run it with `python3` (that's what I did). – Rik Poggi Mar 26 '12 at 19:00
  • Another easy to apply improvement would be to use `zip(count(1), islice(lst, 1, None, 1), islice(lst, 2, None, 1))` instead of building new lists every time with `lst[1:]` and `lst[2:]`. Anyway, I'll think of some real improvement for my code. – Rik Poggi Mar 26 '12 at 19:17
  • Right...I made the changes, but even with those changes, I still find that the third method is slowest for large lists of numbers. – Scott B Mar 26 '12 at 19:18
  • I've updated my answer as I told you, I timed it and I got it to be the fastest one. Could that be fast enough? :) – Rik Poggi Mar 26 '12 at 21:19
  • There also seem to be an error in your timing function, this: `timeit.Timer('test_data(data, f)', ...)` should be `timeit.Timer('f(data)', ...)` otherwise you'll be timing the `test_data` function too, inserting a systematic error in the timings (you should also increase the *repeat* number, to have more accurate values). – Rik Poggi Mar 26 '12 at 21:45
  • Yes, the repeat number was larger before, but I decreased it for the large sets or else it took way to long. And I agree that it should test f(data) instead, so I'll update that. – Scott B Mar 26 '12 at 21:47
  • Yes, changing it to `f(data)` will be more appropriate because you'll be able to do relative comparisons (aka using %). I didn't used any `formatter` module, `formatter` it's a function that I defined at the beginning of my answer. SO have automatically moved the conversation in chat [here](http://chat.stackoverflow.com/rooms/9340/discussion-between-scott-b-and-rik-poggi). If you want we can continue there. – Rik Poggi Mar 26 '12 at 22:00
  • Updated it with your new solution and other comments. I still find the first to be quicker for larger sets of numbers, but your one to be quicker for some smaller sets. It also seems to depend on how "dense" the data is (i.e.....can it be reduced to a single expression like 1-5000:1, or is there many missing points so it is many appeneded ranges). – Scott B Mar 26 '12 at 22:16
  • You are still useing `range()`, **you'll have to use `xrange()` in Python 2**. I really don't want to sound like bragging, but do that and you'll see that mine is faster, even for larger set numbers. (To test performances you should test it on real case application) – Rik Poggi Mar 26 '12 at 22:43
  • I did update it to xrange before you posted this comment, so not sure how you saw that, but yes - with xrange, it seems to be quicker. thanks for the help...either would be perfectly fine in my greater application, so I won't waste more time trying to "optimize" anything more! Thanks again, :) – Scott B Mar 26 '12 at 22:45
0

This function should do what you need without requiring any imports.

def listToRanges(self, intList):
    ret = []
    for val in sorted(intList):
        if not ret or ret[-1][-1]+1 != val:
            ret.append([val])
        else:
            ret[-1].append(val)
    return ",".join([str(x[0]) if len(x)==1 else str(x[0])+"-"+str(x[-1]) for x in ret])