Based on the answers here it doesn't seem like there's an easy way to fill a 2D numpy array with data from a generator.
However, if someone can think of a way to vectorize or otherwise speed up the following function I would appreciate it.
The difference here is that I want to process the values from the generator in batches rather than create the whole array in memory. The only way I could think of doing that was with a for loop.
import numpy as np
from itertools import permutations
permutations_of_values = permutations(range(1,20), 7)
def array_from_generator(generator, arr):
"""Fills the numpy array provided with values from
the generator provided. Number of columns in arr
must match the number of values yielded by the
generator."""
count = 0
for row in arr:
try:
item = next(generator)
except StopIteration:
break
row[:] = item
count += 1
return arr[:count,:]
batch_size = 100000
empty_array = np.empty((batch_size, 7), dtype=int)
batch_of_values = array_from_generator(permutations_of_values, empty_array)
print(batch_of_values[0:5])
Output:
[[ 1 2 3 4 5 6 7]
[ 1 2 3 4 5 6 8]
[ 1 2 3 4 5 6 9]
[ 1 2 3 4 5 6 10]
[ 1 2 3 4 5 6 11]]
Speed test:
%timeit array_from_generator(permutations_of_values, empty_array)
10 loops, best of 3: 137 ms per loop
ADDITION:
As suggested by @COLDSPEED (thanks) here is a version that uses a list to gather the data from the generator. It's about twice as fast as above code. Can anyone improve on this:
permutations_of_values = permutations(range(1,20), 7)
def array_from_generator2(generator, rows=batch_size):
"""Creates a numpy array from a specified number
of values from the generator provided."""
data = []
for row in range(rows):
try:
data.append(next(generator))
except StopIteration:
break
return np.array(data)
batch_size = 100000
batch_of_values = array_from_generator2(permutations_of_values, rows=100000)
print(batch_of_values[0:5])
Output:
[[ 1 2 3 4 5 6 7]
[ 1 2 3 4 5 6 8]
[ 1 2 3 4 5 6 9]
[ 1 2 3 4 5 6 10]
[ 1 2 3 4 5 6 11]]
Speed test:
%timeit array_from_generator2(permutations_of_values, rows=100000)
10 loops, best of 3: 85.6 ms per loop