7

I am trying to find out whether a list of integers is coherent or 'at one stretch', meaning that the difference between two neighboring elements must be exactly one and that the numbers must be increasing monotonically. I found a neat approach where we can group by the number in the list minus the position of the element in the list -- this difference changes when the numbers are not coherent. Obviously, there should be exactly one group when the sequence does not contain gaps or repetitions.

Test:

>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [1, 2, 3, 4, 5, 7]
>>> l3 = [1, 2, 3, 4, 5, 5]
>>> l4 = [1, 2, 3, 4, 5, 4]
>>> l5 = [6, 5, 4, 3, 2, 1]
>>> def is_coherent(seq):
...     return len(list(g for _, g in itertools.groupby(enumerate(seq), lambda (i,e): i-e))) == 1
... 
>>> is_coherent(l1)
True
>>> is_coherent(l2)
False
>>> is_coherent(l3)
False
>>> is_coherent(l4)
False
>>> is_coherent(l5)
False

It works well, but I personally find that this solution is a bit too convoluted in view of the simplicity of the problem. Can you come up with a clearer way to achieve the same without significantly increasing the code length?

Edit: summary of answers

From the answers given below, the solution

def is_coherent(seq):
    return seq == range(seq[0], seq[-1]+1)

clearly wins. For small lists (10^3 elements), it is on the order of 10 times faster than the groupby approach and (on my machine) still four times faster than the next best approach (using izip_longest). It has the worst scaling behavior, but even for a large list with 10^8 elements it is still two times faster than the next best approach, which again is the izip_longest-based solution.

Relevant timing information obtained with timeit:

Testing is_coherent_groupby...
   small/large/larger/verylarge duration: 8.27 s, 20.23 s, 20.22 s, 20.76 s
   largest/smallest = 2.51
Testing is_coherent_npdiff...
   small/large/larger/verylarge duration: 7.05 s, 15.81 s, 16.16 s, 15.94 s
   largest/smallest = 2.26
Testing is_coherent_zip...
   small/large/larger/verylarge duration: 5.74 s, 20.54 s, 21.69 s, 24.62 s
   largest/smallest = 4.28
Testing is_coherent_izip_longest...
   small/large/larger/verylarge duration: 4.20 s, 10.81 s, 10.76 s, 10.81 s
   largest/smallest = 2.58
Testing is_coherent_all_xrange...
   small/large/larger/verylarge duration: 6.52 s, 17.06 s, 17.44 s, 17.30 s
   largest/smallest = 2.65
Testing is_coherent_range...
   small/large/larger/verylarge duration: 0.96 s, 4.14 s, 4.48 s, 4.48 s
   largest/smallest = 4.66

Testing code:

import itertools
import numpy as np
import timeit


setup = """
import numpy as np
def is_coherent_groupby(seq):
    return len(list(g for _, g in itertools.groupby(enumerate(seq), lambda (i,e): i-e))) == 1

def is_coherent_npdiff(x):
    return all(np.diff(x) == 1)

def is_coherent_zip(seq):
    return all(x==y+1 for x, y in zip(seq[1:], seq))

def is_coherent_izip_longest(l):
    return all(a==b for a, b in itertools.izip_longest(l, xrange(l[0], l[-1]+1)))

def is_coherent_all_xrange(l):
    return all(l[i] + 1 == l[i+1] for i in xrange(len(l)-1))

def is_coherent_range(seq):
    return seq == range(seq[0], seq[-1]+1)


small_list = range(10**3)
large_list = range(10**6)
larger_list = range(10**7)
very_large_list = range(10**8)
"""


fs = [
    'is_coherent_groupby',
    'is_coherent_npdiff',
    'is_coherent_zip',
    'is_coherent_izip_longest',
    'is_coherent_all_xrange',
    'is_coherent_range'
    ]


for n in fs:
    print "Testing %s..." % n
    t1 = timeit.timeit(
        '%s(small_list)' % n, 
        setup,
        number=40000
        )      
    t2 = timeit.timeit(
        '%s(large_list)' % n, 
        setup,
        number=100
        )     
    t3 = timeit.timeit(
        '%s(larger_list)' % n, 
        setup,
        number=10
        )
    t4 =  timeit.timeit(
        '%s(very_large_list)' % n, 
        setup,
        number=1
        )
    print "   small/large/larger/verylarge duration: %.2f s, %.2f s, %.2f s, %.2f s" % (t1, t2, t3, t4)
    print "   largest/smallest = %.2f" % (t4/t1)

Test machine:

  • Linux 3.2.0 (Ubuntu 12.04)
  • Python 2.7.3 (gcc 4.1.2)
  • numpy 1.6.2 built with Intel compiler
  • CPU: E5-2650 @ 2.00GHz
  • 24 GB of memory
Community
  • 1
  • 1
Dr. Jan-Philip Gehrcke
  • 33,287
  • 14
  • 85
  • 130

6 Answers6

5

how bout

sorted_list = sorted(my_list)
return sorted_list == range(sorted_list[0],sorted_list[-1]+1)

or if its only coherent if it is already sorted

return my_list == range(my_list[0],my_list[-1]+1)

if you are using python 3 you will need list(range(...))

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Couldn't you then just do `list == range(list[0], list[-1]+1)`? :) – Dr. Jan-Philip Gehrcke Aug 08 '13 at 17:03
  • You don't need to sort or any `min` or `max`. Monotonicity is requirement! See: `>>> l1 == range(l1[0], l1[-1]+1) -> True` -- I like this approach, now we just need to think about performance :) – Dr. Jan-Philip Gehrcke Aug 08 '13 at 17:04
  • yeah I dont know ... it depends on how you define coherent ahh you added more examples in an edit I see :) ... – Joran Beasley Aug 08 '13 at 17:05
  • 1
    You'll need to wrap `range()` in `list()` for Python 3; maybe worth mentioning that. – Zero Piraeus Aug 08 '13 at 17:05
  • @JoranBeasley: I defined it as "meaning that the difference between two neighboring elements must be exactly one and that the numbers must be increasing monotonically" in the question. – Dr. Jan-Philip Gehrcke Aug 08 '13 at 17:06
  • Would you mind discussing the performance of this approach? It requires creation of one potentially large list and comparing two potentially large lists. Also, regarding the requirements, you can safely add `list == range(list[0], list[-1]+1)` to your answer! (edit: now Carl has already done it...) – Dr. Jan-Philip Gehrcke Aug 08 '13 at 17:12
  • range(0,100000000) takes about .5 seconds ... comparison took about 1 second (on my system) – Joran Beasley Aug 08 '13 at 17:21
  • The `list == range(list[0], list[-1]+1)` solution wins, see edit in my answer for detailed timing info. Thanks! – Dr. Jan-Philip Gehrcke Aug 09 '13 at 09:43
2

Unless I'm overlooking something in your examples, this simpler solution is actually shorter.

>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [1, 2, 3, 4, 5, 7]
>>> l3 = [1, 2, 3, 4, 5, 5]
>>> l4 = [1, 2, 3, 4, 5, 4]
>>> l5 = [6, 5, 4, 3, 2, 1]
>>> 
>>> def is_coherent(seq):
...     return seq == range(seq[0], seq[0]+len(seq), 1)
... 
>>> is_coherent(l1)
True
>>> is_coherent(l2)
False
>>> is_coherent(l3)
False
>>> is_coherent(l4)
False
>>> is_coherent(l5)
False
>>> 

The results of some basic performance tests seem to indicate that this method is significantly quicker (I've added your example as is_coherent2):

Carl > python -m timeit -s 'from t import is_coherent, l1' 'is_coherent(l1)'
1000000 loops, best of 3: 0.782 usec per loop
Carl > python -m timeit -s 'from t import is_coherent, l3' 'is_coherent(l3)'
1000000 loops, best of 3: 0.796 usec per loop
Carl > python -m timeit -s 'from t import is_coherent2, l1' 'is_coherent2(l1)'
100000 loops, best of 3: 4.54 usec per loop
Carl > python -m timeit -s 'from t import is_coherent2, l3' 'is_coherent2(l3)'
100000 loops, best of 3: 4.93 usec per loop
Carl Groner
  • 4,149
  • 1
  • 19
  • 20
2

If you're looking for a numpy solution:

import numpy as np

def is_coherent(x):
    return all(np.diff(x) == 1)

is_coherent(np.array([1,2,3,4,5]))
Out[39]: True

is_coherent(np.array([1,2,3,4,8]))
Out[40]: False
Community
  • 1
  • 1
roippi
  • 25,533
  • 4
  • 48
  • 73
1
def is_coherent(seq):
    return all(x==y+1 for x, y in zip(seq[1:], seq))
Brionius
  • 13,858
  • 3
  • 38
  • 49
1

This short circuits and does not create an extra list making it useful for testing very large lists.

def is_coherent(l):
    return all(a==b for a, b in izip_longest(l, xrange(l[0], l[-1]+1)))

Or

def is_coherent(l):
    return all(l[i] + 1 == l[i+1] for i in xrange(len(l)-1))
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
0

i dont know python but i know its functional so heres a small loop function that will do it if you change the syntax for correct python.

PSEUDO CODE

def is_coherent(seq):
     for x in xrange(1, len(seq)-1):
        if (seq[x+1]-seq[x] != 1)  return false;   
     return true
MingMan
  • 911
  • 1
  • 10
  • 31