8

Is there an efficient or elegant way to retrieve all the k-size sublists of a list in Python? For example:

arr = [2, 3, 5, 7, 11, 13]

I want all 3-element sublists:

result = [[2, 3, 5],
          [3, 5, 7],
          [5, 7, 11],
          [7, 11, 13]]

I know I could create this with a for loop, slicing the list with arr[i:i+3], but the lists I'm dealing with are gigantic and I'm hoping for an efficient mechanism, or at least an elegant or Pythonic mechanism.

I'm using Pandas as well, so happy to use a Pandas mechanism.

at.
  • 50,922
  • 104
  • 292
  • 461
  • 2
    This will help: https://codereview.stackexchange.com/questions/196350/slice-a-list-or-numpy-array-into-consecutive-tuples – Pygirl Mar 17 '21 at 07:14

4 Answers4

9

If you actually want to construct the list, I don't think you'll do better than a basic list comprehension like this:

arr = [2, 3, 5, 7, 11, 13]
result = [arr[i:i+k] for i in range(len(arr)-k+1)]

If you want to minimize memory use, you could use a generator:

arr = [2, 3, 5, 7, 11, 13]
def window(arr, k):
    for i in range(len(arr)-k+1):
        yield arr[i:i+k]

for group in window(arr, 3):
    ...  # do something with group

You could also do something where you zip together k copies of the list, each offset by one. But that would take as much memory as the first solution, probably without much performance advantage.

There may be something quick and efficient in numpy or pandas, but you would need to show more about what your input and output should look like.

There are some other ideas here, but they are focused on general iterables (where you can only pull items out once), rather than lists (where you can access items by index, possibly repeatedly).

Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
  • 1
    If you don't need a generator, here is a one-liner: `slidingwindows = lambda L, n: [L[i:i+n] for i in range(len(L)-n+1)]` – Basj Jan 04 '22 at 12:34
7

You can use more_itertools

import more_itertools
list(more_itertools.windowed(arr,3))

[(2, 3, 5), (3, 5, 7), (5, 7, 11), (7, 11, 13)]

OR

using itertools:

from itertools import islice

def pairwise(iterable, n):
    "s -> (s0,s1,..s(n-1)), (s1,s2,.., sn), (s2, s3,..,s(n+1)), ..."
    iters = iter(iterable)
    result = tuple(islice(iters, n))
    if len(result) == n:
        yield result
    for elem in iters:
        result = result[1:] + (elem,)
        yield result
Pygirl
  • 12,969
  • 5
  • 30
  • 43
6

You can use strides:

arr = [2, 3, 5, 7, 11, 13]

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = rolling_window(np.array(arr), 3)
print (a)
[[ 2  3  5]
 [ 3  5  7]
 [ 5  7 11]
 [ 7 11 13]]


print (a.tolist())
[[2, 3, 5], 
 [3, 5, 7], 
 [5, 7, 11], 
 [7, 11, 13]]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

If your source (list) is gigantic, it means the source provider should produce a value on demand. The way of doing that is by making a generator.

Hypothetical source generator from a file;

def gen_value():
    with open('big-file.txt') as f:
        for line in f:
            for x in line.split():
                yield int(x)

The grouper function recipe may be used to consume the generator:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

So you may call list(grouper(gen(), 3))

sardok
  • 1,086
  • 1
  • 10
  • 19