3

I'm not looking for a solution (I have two ;) ), but on insight to compare the strengths and weaknesses of each solution considering python's internals. Thanks !

With a coworker, we wish to extract the difference between two successive list elements, for all elements. So, for list :

[1,2,4]

the expected output is :

[1,2]

(since 2-1 = 1, and 4-2 = 2).

We came with two solutions and I am not sure how they compare. The first one is very C-like, it considers the list as a table and substracts the difference between two successive list elements.

res = []
for i in range(0, len(a)-1):
    res.append(a[i+1] - a[i])

The second one is (for a list "l"), I think, more pythonic :

[j - i for i,j in zip(l[:-1], l[1:])]

Though, isn't it far less efficient to build two copies of the list to then extract the differences ? How does Python handle this internally ?

Thanks for your insights !

Jokyjo
  • 144
  • 1
  • 8
  • You should post both solutions, they'll be much easier to compare. – TheSoundDefense Jul 22 '14 at 14:35
  • If you don't want to built list copies, use `itertools.izip` and `itertools.islice`. – jonrsharpe Jul 22 '14 at 14:42
  • 2
    Are these lists going to be large enough to care about the performance? Just write the most readable version and maintainable version. – Daenyth Jul 22 '14 at 15:09
  • You could, of course, simply use `numpy.diff([1,2,4])` - but that doesn't answer your question – loopbackbee Jul 22 '14 at 15:15
  • 1
    Note: you can easily improve the performance of the second solution by using `l` instead of `l[:-1]`. `zip` already terminates when the *shortest* input is finished, so you have to need to remove the last element. – Bakuriu Jul 22 '14 at 17:25
  • @Daenyth: Yes, probably around a few million elements. – Jokyjo Jul 23 '14 at 09:08

3 Answers3

3

With a generator:

def diff_elements(lst):
    """
    >>> list(diff_elements([]))
    []
    >>> list(diff_elements([1]))
    []
    >>> list(diff_elements([1, 2, 4, 7]))
    [1, 2, 3]
    """
    as_iter = iter(lst)
    last = next(as_iter)
    for value in as_iter:
        yield value - last
        last = value

This has the nice properties of:

  1. Being readable, and
  2. Working on infinitely large data sets.
Kirk Strauser
  • 30,189
  • 5
  • 49
  • 65
0

If I understood your question I suggest you use something like that:

diffList = lambda l: [(l[i] - l[i-1]) for i in range(1, len(l))]
answer = diffList( [ 1,2,4] )

This function will give you a list with the differences between all consecutive elements in the input list.

This one is similar with your first approach (and still somewhat pythonic) what is more efficient than the second one.

0

With no lambdas:

[l[i+1] - l[i] for i in range(len(l) - 1)]

Eg:

>>> l = [1, 4, 8, 15, 16]
>>> [l[i+1] - l[i] for i in range(len(l) - 1)]
[3, 4, 7, 1]

A bit faster, as you can see (EDIT: Adding the most voted solution in https://stackoverflow.com/a/2400875/1171280):

>>> import timeit
>>> 
>>> s = """\
...     l = [1, 4, 7, 15, 16]
...     [l[i+1] - l[i] for i in range(len(l) - 1)]
... """


>>> r = """\
...     l = [1, 4, 7, 15, 16]
...     [j - i for i,j in zip(l[:-1], l[1:])]
... """

>>> t = """\
...     l = [1, 4, 7, 15, 16]
...     [j-i for i, j in itertools.izip(l[:-1], l[1:])]
... """

>>> timeit.timeit(stmt=s, number=100000)
0.09615588188171387
>>> timeit.timeit(stmt=s, number=100000)
0.09774398803710938
>>> timeit.timeit(stmt=s, number=100000)
0.09683513641357422
#-------------
>>> timeit.timeit(stmt=r, number=100000)
0.14137601852416992
>>> timeit.timeit(stmt=r, number=100000)
0.12511301040649414
>>> timeit.timeit(stmt=r, number=100000)
0.12285017967224121
#-------------
>>> timeit.timeit(stmt=t, number=100000)
0.11506795883178711
>>> timeit.timeit(stmt=t, number=100000)
0.11677718162536621
>>> timeit.timeit(stmt=t, number=100000)
0.11829996109008789
Community
  • 1
  • 1
Alberto Megía
  • 2,225
  • 3
  • 23
  • 33