There's not much you can do, as stated in the comments.
Although you can consider these two solutions:
using numpy.fromiter
Instead of creating data = np.empty((n, k))
yourself, use numpy.fromiter
and the count
argument, which is made specifically from this case where you know the number of items in advance. This way numpy won't have to "guess" the size and re-allocate until the guess is large enough.
Using fromiter
allows to run the for
loop in C instead of python. This might be a tiny bit faster, but the real bottleneck will likely be in your generators anyway.
Note that fromiter
only deals with flat arrays, so you need to read everything flatten (e.g. using chain.from_iterable
) and only then call reshape
:
from itertools import chain
n = 20
k = 4
generators = (
(i*j for j in range(k))
for i in range(n)
)
flat_gen = chain.from_iterable(generators)
data = numpy.fromiter(flat_gen, 'int64', count=n*k)
data = data.reshape((n, k))
"""
array([[ 0, 0, 0, 0],
[ 0, 1, 2, 3],
[ 0, 2, 4, 6],
[ 0, 3, 6, 9],
[ 0, 4, 8, 12],
[ 0, 5, 10, 15],
[ 0, 6, 12, 18],
[ 0, 7, 14, 21],
[ 0, 8, 16, 24],
[ 0, 9, 18, 27],
[ 0, 10, 20, 30],
[ 0, 11, 22, 33],
[ 0, 12, 24, 36],
[ 0, 13, 26, 39],
[ 0, 14, 28, 42],
[ 0, 15, 30, 45],
[ 0, 16, 32, 48],
[ 0, 17, 34, 51],
[ 0, 18, 36, 54],
[ 0, 19, 38, 57]])
"""
using cython
If you can re-use data
and want to avoid re-allocation of the memory, you can't use numpy's fromiter
anymore. IMHO the only way to avoid the python's for
loop is to implement it in cython. Again, this is extremely likely overkill, since you still have to read the generators in python.
For reference, the C implementation of fromiter
looks like that: https://github.com/numpy/numpy/blob/v1.18.3/numpy/core/src/multiarray/ctors.c#L4001-L4118