Efficient one-liner to subsample `i`th `n` elements out of every `m` elements on a list

Question

I am looking for a memory/cpu efficient one-liner to subsample n out of every m elements on a list. So far I have got:

sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel (sample) size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size

[i for l in [sb[j+ci-1:j+ci-1+cs] for j
    in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l]

Out: [11, 12, 11, 12, 11, 12, 11, 12]

Breaking it down I am creating a list of sample lists and then flattening it to a one dimensional list with [i for l in ll for i in l]

Alternatively, not a one liner, but easier to read, I can do:

os = []
for i in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]]: os = os+i

Both solutions look way too convoluted when compared, for example, with the super simple shorthand for cs=1 particular case: sb[ci-1::fs].

Can you help me come up with a decent solution?

do you need random sampling? `random.sample([11,12,21,22,31,32]*4,8) ` would draw 8 random samples from the list, without repeating any index on sampling - your solution looks more like a linear every n.th sublis sampling — Patrick Artner, Jan 24 '18 at 19:27
@Patrick Artner: No. I need to extract a channel out of a stream (channel demultiplexing). The stream is flattened on a list of elements. Every channel takes cs contiguous elements per sample. Every frame on the stream contains nc consecutive samples, one per channel. I want to extract every sample from a particular channel from the stream onto a new list, keeping the order, of course. — NotGaeL, Jan 24 '18 at 19:32
Are you trying to get the first *n* number of the identical frames from the stream buffer, which in this case is 2 elements in each of the 4 identical frame? — r.ook, Jan 24 '18 at 19:32
@Idlehands: more or less. Yes in this case 2 elements in each of the 4 identical frames, from the selected channel. (Sorry about the first comment it was me misunderstanding what you wrote). — NotGaeL, Jan 24 '18 at 19:37

score 2 · Answer 1 · answered Jan 24 '18 at 20:45

2

The following seems fairly readable to me (and is also fairly efficient):

from itertools import chain

sb = [11, 12, 21, 22, 31, 32]*4  # stream buffer, e.g. 4 identical frames

ci = 1      # 1-indexed channel index
cs = 2      # channel size
nc = 3      # number of channels in each frame
fs = nc*cs  # frame size

result = list(chain.from_iterable(sb[i: i+cs] for i in xrange(ci-1, len(sb), fs)))
print(result)  # -> [11, 12, 11, 12, 11, 12, 11, 12]

answered Jan 24 '18 at 20:45

martineau

119,623
25
170
301

I was also looking into itertools, it is clearer if you are familiar with it so it is a good option. Still I would expect an even simpler solution from a library that's supposed to help you iterate and just that. Is this such an uncommon problem I am trying to solve? – NotGaeL Jan 24 '18 at 21:07
@NotGaeL can you describe the simplicity you're looking for? – pylang Jan 24 '18 at 21:58
NotGaeL: You may not realize it, but the argument being passed to `chain.from_iterable()` is what is known as a [generator expression](https://docs.python.org/3/reference/expressions.html#generator-expressions), which means it, too, is being executed in an iterative fashion. That's part of what I meant when I said the code was fairly efficient. The main point being that Python does some things iteratively without you needing to use `itertools` to make it happen. Sounds like you expect some existing library function to do all the specialized stuff you want done. – martineau Jan 24 '18 at 22:39
@pylang: `sb[ci-1::fs]` – NotGaeL Jan 25 '18 at 08:59
@martineau: I am expecting an itertools specialized library to provide the iteration tools to simplify some specialized iterative stuff (iterate and select the `k`th `n` elements out of every `m` elements). It is such a simple task even natural language does the work in one sentence. I don't want to bring down the discussion to sarcasm, I mean it's just what it's expected from a whole library dedicated to facilitate the iteration over elements. – NotGaeL Jan 25 '18 at 09:03
@NotGael If you can generalize your problem and it actually is helpful to other people, you can submit a patch it might get accepted. That's the beauty of open source. – pylang Jan 25 '18 at 09:12

Patrick Artner · Accepted Answer · 2018-01-24T20:21:45.773

I moved most of the indexing into a range() computation. Its faster then manifesting the indexes into a sublist - see timing down below:

sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size

for ci in range(1,4):
    print [x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y]

Output:

[11, 12, 11, 12, 11, 12, 11, 12]
[21, 22, 21, 22, 21, 22, 21, 22]
[31, 32, 31, 32, 31, 32, 31, 32]

I moved most of the work into the range() call - producing list of sublists, the rest is a simple decomposition of the sublists into one list.

range((ci-1)*cs,len(sb), fs)
         |         |     |________  frame size, range will use steps the size of the frame
         |         |______________  till end of data
         |________________________  starting at (ci-1) * channel size   

for ci = 1 it starts at 0,   6,12,18,....
for ci = 2 it starts at 2,   8,14,....
for ci = 3 it starts at 4,  10,...  
for ci = 4 it starts at 6,  ...  
    and increases by fs = 6 until end of data. The list comp then gets a sublist of len cs
    and the rest of the list-comp flattens it down from list of list to a simpler list

Timing:

import timeit

print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
    [x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y] 

''', setup='pass', number=10000)  #  0.588474035263

print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
    [i for l in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l] 

''', setup='pass', number=10000)   # 0.734045982361

I like it better your way as it is a little bit more readable, but it is essentially the same solution, right? It is still too complex and it does not improve performance. Isn't there a simpler way? — NotGaeL, Jan 24 '18 at 19:52
@NotGaeL Essentially I get rid of one manifested list by doing the indexing-math in the range() object which is highly optimized generator, so if "should" perfom somethwat better. You would need to measuere it against your data to make sure. — Patrick Artner, Jan 24 '18 at 19:55
@NotGaeL upped it to *20 and performed it 10000 times, my version is almost 20% faster — Patrick Artner, Jan 24 '18 at 20:05
wow, that's way more than I would imagine. It still pains me that it is so difficult to read, but definitely a great improvement Thanks for the help! — NotGaeL, Jan 24 '18 at 21:08
if you remove the costly data setup from the execution, the timed portions change to 0.281064033508 vs 0.385651111603 - about 72.88% of your solution — Patrick Artner, Jan 25 '18 at 09:56

r.ook · Answer 3 · 2018-01-25T05:20:14.853

-1

Code:

sb = [11,12,21,22,31,32] * 4
ci = 0
cs = 2
nc = 3
fs = cs * nc
result = list(sum(zip(*[sb[i::fs] for i in range(ci, ci+cs)]),()))

Output:

[11, 12, 11, 12, 11, 12, 11, 12]

I recommend setting ci to 0 based index to match python's syntax, but if you insist, it's simple to update the func, just replace all the ci with ci-1.

It's essentially the same idea as your original approach, just a bit cleaner, and it can scale to different ci, cs and nc.

edited Jan 25 '18 at 05:20

answered Jan 24 '18 at 19:52

r.ook

13,466
2
22
39

1

This does not produce that output and also it is not a single list but a list of lists. – NotGaeL Jan 24 '18 at 19:55
Okay, let me take a look. I'm still trying to decypher what `ci` is supposed to be. That's why my comment asks if you're only looking for *n* elements from a given frame size. – r.ook Jan 24 '18 at 19:57
ci is the 1-indexed channel index – NotGaeL Jan 24 '18 at 19:58
1

if `ci==1 and cs==2` then you take the first and second element on each frame of `nc*cs` elements from the list. If `ci==2 and cs==2` then you would be taking the 3rd and 4th element from each frame and so on... – NotGaeL Jan 24 '18 at 20:00
Ugh, I set up my list to be `range(24)` to see the data clearly but that became my downfall for `i` because I confused it with the index. My bad, I deserved that. – r.ook Jan 24 '18 at 21:23
1

use `list(it.chain(*results))` – pylang Jan 24 '18 at 22:07
@NotGaeL check my edit, I know Patrick already answered but I have fixed my answer. Just wanted to redeem myself a bit. – r.ook Jan 25 '18 at 01:10
Don't use `sum()` to flatten a list. As I understand, it has quadratic performance. https://stackoverflow.com/a/952946/4531270. Use `itertools`. – pylang Jan 25 '18 at 10:07
@pylang I understand, I used `sum` as another martineau and you have both showed a `itertools.chain` solution. OP lightly mentioned he was expecting if it's using itertools then a more lightweight solution should exist, so I stuck with the build-ins and offered something different that was not yet on the table. I do agree it is not the best solution compared to the others. – r.ook Jan 25 '18 at 15:26

pylang · Answer 4 · 2018-01-24T22:00:58.753

I advise to use more clear variable names instead of comments and not to use one-liners.

Given

import itertools as it 


stream = [11, 12, 21, 22, 31, 32] * 4
ch_idx = 1
ch_size = 2 
num_chs = 3

Code

Using the grouper itertools recipe:

channels = grouper(ch_size, stream)
frames = grouper(num_chs, channels)
list(it.chain.from_iterable(*it.islice(zip(*frames), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]

As a one-liner, it looks as follows:

list(it.chain.from_iterable(*it.islice(zip(*grouper(num_chs, grouper(ch_size, stream))), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]

Details

The grouper recipe is implemented as follows:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

See also the more_itertools third-party library for pre-implemented recipes.

Efficient one-liner to subsample `i`th `n` elements out of every `m` elements on a list

4 Answers4