23

I'm looking for an efficient way of achieving this, which I think is a slicing-like operation:

>>> mylist = range(100)
>>>magicslicer(mylist, 10, 20)
[0,1,2,3,4,5,6,7,8,9,30,31,32,33,34,35,36,37,38,39,60,61,62,63......,97,98,99]

the idea is: the slicing gets 10 elements, then skips 20 elements, then gets next 10, then skips next 20, and so on.

I think I should not use loops if possible, for the very reason to use slice is (I guess) to do the "extraction" efficiently in a single operation.

Thanks for reading.

heltonbiker
  • 26,657
  • 28
  • 137
  • 252

8 Answers8

22

itertools.compress (new in 2.7/3.1) nicely supports use cases like this one, especially when combined with itertools.cycle:

from itertools import cycle, compress
seq = range(100)
criteria = cycle([True]*10 + [False]*20) # Use whatever pattern you like
>>> list(compress(seq, criteria))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

Python 2.7 timing (relative to Sven's explicit list comprehension):

$ ./python -m timeit -s "a = range(100)" "[x for start in range(0, len(a), 30) for x in a[start:start+10]]"
100000 loops, best of 3: 4.96 usec per loop

$ ./python -m timeit -s "from itertools import cycle, compress" -s "a = range(100)" -s "criteria = cycle([True]*10 + [False]*20)" "list(compress(a, criteria))"
100000 loops, best of 3: 4.76 usec per loop

Python 3.2 timing (also relative to Sven's explicit list comprehension):

$ ./python -m timeit -s "a = range(100)" "[x for start in range(0, len(a), 30) for x in a[start:start+10]]"
100000 loops, best of 3: 7.41 usec per loop

$ ./python -m timeit -s "from itertools import cycle, compress" -s "a = range(100)" -s "criteria = cycle([True]*10 + [False]*20)" "list(compress(a, criteria))"
100000 loops, best of 3: 4.78 usec per loop

As can be seen, it doesn't make a great deal of difference relative to the in-line list comprehension in 2.7, but helps significantly in 3.2 by avoiding the overhead of the implicit nested scope.

A similar difference can also be seen in 2.7 if the aim is to iterate over the resulting sequence rather than turn it into a fully realised list:

$ ./python -m timeit -s "a = range(100)" "for x in (x for start in range(0, len(a), 30) for x in a[start:start+10]): pass"
100000 loops, best of 3: 6.82 usec per loop
$ ./python -m timeit -s "from itertools import cycle, compress" -s "a = range(100)" -s "criteria = cycle([True]*10 + [False]*20)" "for x in compress(a, criteria): pass"
100000 loops, best of 3: 3.61 usec per loop

For especially long patterns, it is possible to replace the list in the pattern expression with an expression like chain(repeat(True, 10), repeat(False, 20)) so that it never has to be fully created in memory.

ncoghlan
  • 40,168
  • 10
  • 71
  • 80
  • 4
    Nice! Note that the explicit list comprehension will still be faster if you are skipping a lot of values, since `compress()` has to iterate over all of them while the list comprehension really skips them. – Sven Marnach Apr 28 '11 at 08:17
  • 2
    True, although the converse benefit for the `compress()` approach is that it works with arbitrary iterables, even ones that don't support slicing. – ncoghlan Apr 29 '11 at 04:23
  • Oh, my, I created a "monster question" :o) This answer is very what-I-was-looking-for, since I heard the "C level" magic words. I will have to test it soon. Also, this nice way to use `python` straight from command line is new to me. Thanks. – heltonbiker May 01 '11 at 15:30
13

Maybe the best way is the straight-forward approach:

def magicslicer(seq, take, skip):
    return [x for start in range(0, len(seq), take + skip)
              for x in seq[start:start + take]]

I don't think you can avoid the loops.

Edit: Since this is tagged "performance", here a comparison with the modulo solution for a = range(100):

In [2]: %timeit [x for start in range(0, len(a), 30)
                   for x in a[start:start + 10]]
100000 loops, best of 3: 4.89 us per loop

In [3]: %timeit [e for i, e in enumerate(a) if i % 30 < 10]
100000 loops, best of 3: 14.8 us per loop
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • great! Now we also have a simple and useful example of `timeit` use! Thanks! – heltonbiker Apr 27 '11 at 17:20
  • 1
    `itertools.compress` allows the looping to be pushed down to the C level, and is as fast as a list comprehension and faster than a generator expression in 2.7 and faster than both in 3.2 (due to the implicit nested scope in 3.x comprehensions). – ncoghlan Apr 28 '11 at 05:46
4

I think that slices cannot do it, unfortunately. I'd solve the problem using list comprehensions

>>> a = range(100)
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    ...
 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
>>> [e for i, e in enumerate(a) if i % 30 < 10]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
brandizzi
  • 26,083
  • 8
  • 103
  • 158
1

I don't know if you are working with numbers only, but in case you are there is a faster way if you stick to numpy. But the following will only work if you have list that consists of sublists of equal length that were flattened out.

For comparison:

import numpy as np
from itertools import cycle, compress

startList = list(range(0, 3000))
startNpArray = np.linspace(0,2999,3000,dtype=np.int)

def WithNumpy(seq, keep, skip):
    return seq.reshape((-1, keep+skip))[:,:keep+1].flatten()

def WithItertools(seq, keep, skip):
    criteria = cycle([True]*keep + [False]* skip)
    return list(compress(seq, criteria))

%timeit WithNumpy(startListNp, 10, 20)
>>> 2.59 µs ± 48.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit WithItertools(startList, 10, 20)
>>> 33.5 µs ± 911 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
TheoryX
  • 135
  • 10
0

I'd use a loop:

#!/usr/bin/env python


def magicslicer(l, stepsize, stepgap):
    output = []
    i = 0
    while i<len(l):
        output += l[i:i+stepsize]
        i += stepsize + stepgap
    return output


mylist = range(100)
print magicslicer(mylist,10,20)
Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
0
>>>[mylist[start:start+10] for start in mylist[::30]]
>>>[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]

but i obtain a list of list :(

Cédric Julien
  • 78,516
  • 15
  • 127
  • 132
0

[x for x in range(100) if x%30 < 10] is another way to do it. But, this can be slow as the list size grows.

A function on the same lines

def magic_slice(n, no_elems, step):
    s = no_elems + step
    return [x for x in range(n) if x%s < no_elems]
dheerosaur
  • 14,736
  • 6
  • 30
  • 31
  • The people suggesting list comprehensions are on the right track. You are correct that a loop is needed, but list comprehensions put the loop inside the Python engine where it is faster than an explicit loop. – Michael Dillon Apr 27 '11 at 15:10
0
mylist = range(100)

otherlist = ['21','31','689','777','479','51','71','yut','poi','ger',
             '11','61','789','zozozozo','8888','1']



def magic_slicer(iterable,keep,throw):
        it = iter(iterable).next
        for n in xrange((len(iterable)//keep+throw)+1):
                for i in xrange(keep):  yield it()
                for i in xrange(throw):  it()

print list(magic_slicer(mylist,10,20))
print
print list(magic_slicer(otherlist,2,3))


print '__________________'


def magic_slicer2(iterable,keep,throw):
        return ( x for i,x in enumerate(iterable) if -1< i%(keep+throw)<keep) 

print list(magic_slicer2(mylist,10,20))
print
print list(magic_slicer2(otherlist,2,3))

result

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

['21', '31', '51', '71', '11', '61', '1']
__________________
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

['21', '31', '51', '71', '11', '61', '1']
eyquem
  • 26,771
  • 7
  • 38
  • 46