Is there a pythonic way to sample N consecutive elements from a list or numpy array

Question

Is there a pythonic way to select N consecutive elements from a list or numpy array.

So Suppose:

Choice = [1,2,3,4,5,6]

I would like to create a new list of length N by randomly selecting element X in Choice along with the N-1 consecutive elements following choice.

So if:

X = 4 
N = 4

The resulting list would be:

Selection = [5,6,1,2]

I think something similar to the following would work.

S = [] 
for i in range(X,X+N):
    S.append(Selection[i%6])

But I was wondering if there is a python or numpy function that can select the elements at once that was more efficient.

why not just randomly choose a starting index from [0, len(choices) - N)? — alkasm, Jan 27 '21 at 02:23
Check out the [slice notation](https://stackoverflow.com/questions/509211/understanding-slice-notation): `my_list[1:3]`. You will need to figure out the logic if you consider the list circular. — naicolas, Jan 27 '21 at 02:23
Does this answer your question? [Cycle through list starting at a certain element](https://stackoverflow.com/questions/8940737/cycle-through-list-starting-at-a-certain-element) — busybear, Jan 27 '21 at 02:26
@alkasm thanks that's a great idea. Wanted to preserve the permutation and not sure how to sort it if it wasn't sequential. — phntm, Jan 27 '21 at 03:13
Oh sorry @Nick, as you can see i'm a noob and didn't really understand the significance of your question...i think in most cases it will be less than len(Choice). Does that create different solutions? — phntm, Jan 27 '21 at 08:12
@phntm yes, if `N <= len(Choice)` the answer can be much simpler. See the edit to my answer — Nick, Jan 27 '21 at 08:23

chepner · Accepted Answer · 2021-01-27T02:27:43.083

10

Use itertools, specifically islice and cycle.

start = random.randint(0, len(Choice) - 1)
list(islice(cycle(Choice), start, start + n))

cycle(Choice) is an infinite sequence that repeats your original list, so that the slice start:start + n will wrap if necessary.

edited Jan 27 '21 at 02:27

answered Jan 27 '21 at 02:25

chepner

497,756
71
530
681

This is awesome thanks! Is using itertools generally faster than numpy? – phntm Jan 27 '21 at 03:08
2

@phntm: For large inputs, it's almost guaranteed to be slower than any other solution, as it's `O(n)` on both memory usage and processing time (`islice` can't skip values, `cycle` stores them all until the input is exhausted). For small inputs, it hardly matters what solution you use. – ShadowRanger Jan 27 '21 at 03:48

Nick · Answer 2 · 2021-01-27T08:22:45.267

You could use a list comprehension, using modulo operations on the index to keep it in range of the list:

Choice = [1,2,3,4,5,6] 
X = 4 
N = 4
L = len(Choice)
Selection = [Choice[i % L] for i in range(X, X+N)]
print(Selection)

Output

[5, 6, 1, 2]

Note that if N is less than or equal to len(Choice), you can greatly simplify the code:

Choice = [1,2,3,4,5,6] 
X = 4 
N = 4
L = len(Choice)
Selection = Choice[X:X+N] if X+N <= L else Choice[X:] + Choice[:X+N-L]
print(Selection)

Marc · Answer 3 · 2021-01-27T03:42:20.653

Since you are asking for the most efficient way I created a little benchmark to test the solutions proposed in this thread.

I rewrote your current solution as:

def op(choice, x):
    n = len(choice)
    selection = []
    for i in range(x, x + n):
        selection.append(choice[i % n])
    return selection

Where choice is the input list and x is the random index.

These are the results if choice contains 1_000_000 random numbers:

chepner: 0.10840400000000017 s
nick: 0.2066781999999998 s
op: 0.25887470000000024 s
fountainhead: 0.3679908000000003 s

Full code

import random
from itertools import cycle, islice
from time import perf_counter as pc
import numpy as np


def op(choice, x):
    n = len(choice)
    selection = []
    for i in range(x, x + n):
        selection.append(choice[i % n])
    return selection


def nick(choice, x):
    n = len(choice)
    return [choice[i % n] for i in range(x, x + n)]


def fountainhead(choice, x):
    n = len(choice)
    return np.take(choice, range(x, x + n), mode='wrap')


def chepner(choice, x):
    n = len(choice)
    return list(islice(cycle(choice), x, x + n))


results = []
n = 1_000_000
choice = random.sample(range(n), n)
x = random.randint(0, n - 1)

# Correctness
assert op(choice, x) == nick(choice,x) == chepner(choice,x) == list(fountainhead(choice,x))

# Benchmark
for f in op, nick, chepner, fountainhead:
    t0 = pc()
    f(choice, x)
    t1 = pc()
    results.append((t1 - t0, f))

for t, f in sorted(results):
    print(f'{f.__name__}: {t} s')

Thanks for taking the time - it's always interesting to see performance results. — Nick, Jan 27 '21 at 04:39
Interestingly on my computer, for 1,000,000 entries the advantage of `chepner` is only about 15-20%, and it decreases further as the list length gets longer. At shorter lengths (10k or less), `chepner` is more than 2x faster. — Nick, Jan 27 '21 at 04:53
@Nick yes `chepner` solution seems the fastest for a `list`. For larger list size (>10^6) using an array as input (like `array('i', choice)`) is faster and takes less memory. If `choice` is a Numpy array then [this solution](https://stackoverflow.com/a/65912922/8484783) is the fastest regardless of the input size. — Marc, Jan 27 '21 at 06:36

fountainhead · Answer 4 · 2021-01-27T05:15:09.250

3

If using a numpy array as the source, we could of course use numpy "fancy indexing".

So, if ChoiceArray is the numpy array equivalent of the list Choice, and if L is len(Choice) or len(ChoiceArray):

Selection = ChoiceArray [np.arange(X, N+X) % L]

edited Jan 27 '21 at 05:15

answered Jan 27 '21 at 04:35

fountainhead

3,584
1
8
17

fountainhead · Answer 5 · 2021-01-27T02:45:53.123

2

Here's a numpy approach:

import numpy as np

Selection = np.take(Choice, range(X,N+X), mode='wrap')

Works even if Choice is a Python list rather than a numpy array.

edited Jan 27 '21 at 02:45

answered Jan 27 '21 at 02:40

fountainhead

3,584
1
8
17

Thanks this is really helpful! Is there a difference in terms of using np.random.default_rng vs np.random.choice? – phntm Jan 27 '21 at 03:18
1

@phntm - Yes there are differences. I think [this SO thread](https://stackoverflow.com/questions/60599149/what-is-the-difference-between-numpy-randoms-generator-class-and-np-random-meth) discusses that – fountainhead Jan 27 '21 at 05:07

Is there a pythonic way to sample N consecutive elements from a list or numpy array

5 Answers5

Full code