0
list_values = [...]
gen = (
        list_values[pos : pos + bucket_size]
        for pos in range(0, len(list_values), bucket_size)
    )

Is there a way to make this work if list_values is a generator instead? My objective is to reduce RAM usage.

I know that I can use itertools.islice to slice an iterator.

gen = (
        islice(list_values, pos, pos + bucket_size)
        for pos in range(0, len(list_values), bucket_size)
    )

The problem is:

  • How would I remove/substitute len(list_values), which doesn't work for generators?
  • Will the use of islice, in this case, reduce peak RAM usage?
An old man in the sea.
  • 1,169
  • 1
  • 13
  • 30

1 Answers1

1

Extracting slices of a generator can be implemented with another generator function which yields slices of specified size (using itertools.islice function):

# samle generator
gen_values = (i for i in range(20))

def take_block(gen, block_size):
    while True:
        sl = list(itertools.islice(gen, block_size))
        if not sl:
            break
        yield sl

gen = take_block(gen=gen_values, block_size=5)

for b in gen:
    print(b)

[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[10, 11, 12, 13, 14]
[15, 16, 17, 18, 19]
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 1
    This will drop data when the final block is incomplete, which is usually not what you want (most people want to either pad or truncate the final block). Even if you wanted to do it, you'd get better performance with a `zip` trick, making your function simplify to `return zip(*[iter(gen)]*block_size)` (which is how [the `itertools` `grouper` recipe](https://docs.python.org/3/library/itertools.html#itertools-recipes) implements it for the `incomplete='ignore'` case). – ShadowRanger Jan 03 '23 at 16:49
  • @ShadowRanger, `zip(*[iter(gen)] * block_size)` doesn't pick the residual block – RomanPerekhrest Jan 03 '23 at 17:04
  • 1
    Yeah, that was my point. Your code doesn't pick it either (the first failing `next` call bypasses the rest of the listcomp, dropping that data), so if you *want* to drop the residual block, may as well use the faster `zip` solution. The correct solution is using the `grouper` or `batched` recipe from `itertools` (between the two, and the three modes `grouper` can operate in, can handle any scenario). `batched` produces a truncated final block, `grouper` can do what your code does (dropping the final incomplete block), or raise an exception, or (by default) pad it. The duplicate covers it well. – ShadowRanger Jan 03 '23 at 17:14
  • @ShadowRanger, thanks, I settled on `batched` approach, the `grouper` is also interesting – RomanPerekhrest Jan 03 '23 at 17:54