1

I have a generator that yields NumPy arrays, and need a way to rapidly construct another NumPy array from the results of the generator (array of arrays) by taking a specific number of yields from the generator. Speed is the critical aspect in my problem. I've tried np.fromiter but it seems it doesn't support constructing from arrays:

import numpy as np

def generator():
    for i in range(5):
        yield np.array([i]*10)


arr = np.fromiter(iter(generator()), dtype=np.ndarray, count=3)

This throws an error, as described in several other SO posts:

Calling np.sum(np.fromiter(generator))

Numpy ValueError: setting an array element with a sequence

However, I haven't found any answer that offers a rapid way to source arrays from the generator without having to do:

it = iter(generator())
arr = np.array([next(it) for _ in range(3)])

Here it is indeed shown that np.fromiter is much faster: Faster way to convert list of objects to numpy array

Is it possible to rapidly source numpy arrays from the generator without having use the slow list to array conversion? I specifically want to avoid the np.array(list(...)) construct, because I will be calling it hundreds of thousands of times, and the delay will eventually add up and make a big difference in execution time.

Jack Avante
  • 1,405
  • 1
  • 15
  • 32
  • What's the shape of these arrays? Always the same? Dtype? `dtype=np.array` is the same as `dtype=object`. What `shape` and `dtype` do you expect. – hpaulj Apr 19 '22 at 15:49
  • @hpaulj Didn't know you can specify those in the `dtype` in all honesty. The expected arrays are 3D arrays (images) of type `np.float32` – Jack Avante Apr 20 '22 at 12:45

1 Answers1

0

What about using itertools.islice?

from itertools import islice
g = generator()
arr = np.array(list(islice(g, 3)))

# or in one line:
# arr = np.array(list(islice(generator(), 3

output:

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
mozway
  • 194,879
  • 13
  • 39
  • 75
  • You use exactly what I'm trying to avoid using, as described in the post: `np.array(list(...))` but also add an additional function call – Jack Avante Apr 20 '22 at 12:47
  • @JackAvante you should describe your exact use case. `numpy.fromiter` doesn't work on iter**ators** but on iter**ables**. I think the array constructor requires to know the size of the underlying object, which an iter**ator** cannot provide. Also, **one** function call is negligible unless you want to apply this operation many times? – mozway Apr 20 '22 at 12:54
  • Indeed, this operation will be happening many times, which I should've mentioned, that is my bad. That is why speed is the most important aspect. My generator function will provide 3-dimensional numpy arrays (images), which will be loaded very often and repetitively using that same function, so I want to completely avoid the `np.array(list(...))` construction altogether if possible, since it will add up over hundreds of thousands of iterations. – Jack Avante Apr 20 '22 at 14:52