Usually when we create an array with iteration we either collect the values in a list, and create the array from that. Or we allocate an empty list and assign values to slots.
Here's a way of doing the assignment, where the generator returns a tuple of arrays:
def mk_array(N):
for i in range(N):
img=np.ones((2,3,3),int)
L=img[:,:,:1]*i
ab=img[:,:,1:].astype(float)*i/10
yield L,ab
I made one an array of ints, the other an array of floats. That reduces the temptation to concatenate them into one.
In [157]: g=mk_array(4)
In [158]: for i,v in enumerate(g):
print(v[0].shape,v[1].shape)
.....:
(2, 3, 1) (2, 3, 2)
(2, 3, 1) (2, 3, 2)
(2, 3, 1) (2, 3, 2)
(2, 3, 1) (2, 3, 2)
Lets allocate target arrays of the right shape; here I put the iteration axis 3rd, but it could be anywhere
In [159]: L, ab = np.empty((2,3,4,1),int), np.empty((2,3,4,2),float)
In [160]: for i,v in enumerate(g):
L[...,i,:], ab[...,i,:] = v
My guess this is as fast as any fromiter
or stack
alternative. And when the components are generated by reading from files, that step is bound to be the most expensive - more so than the iteration mechanism or array copies.
================
If the iterator returned a tuple of scalars, we can use fromiter
:
def mk_array1(N):
for i in range(N):
img=np.ones((2,3,3),int)
L=img[:,:,:1]*i
ab=img[:,:,1:].astype(float)*i/10
for i,j in zip(L.ravel(),ab.ravel()):
yield i,j
In [184]: g=mk_array1(2)
In [185]: V=np.fromiter(g,dtype=('i,f'))
producing a 1d structured array:
In [186]: V
Out[186]:
array([(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0),
(1, 0.10000000149011612), (1, 0.10000000149011612),
(1, 0.10000000149011612), (1, 0.10000000149011612),
(1, 0.10000000149011612), (1, 0.10000000149011612)],
dtype=[('f0', '<i4'), ('f1', '<f4')])
which can be reshaped, and arrays separated by field name:
In [187]: V['f0']
Out[187]: array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], dtype=int32)
In [188]: V.reshape(2,2,3)['f0']
Out[188]:
array([[[0, 0, 0],
[0, 0, 0]],
[[1, 1, 1],
[1, 1, 1]]], dtype=int32)
In [189]: V.reshape(2,2,3)['f1']
Out[189]:
array([[[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]],
[[ 0.1, 0.1, 0.1],
[ 0.1, 0.1, 0.1]]], dtype=float32)
================
What if I define a more complex dtype
, one where each field has an array:
In [200]: dt=np.dtype([('f0',int,(2,3,1)),('f1',float,(2,3,2))])
In [201]: g=mk_array(2) # the original generator
In [202]: V=np.fromiter(g,dtype=dt)
In [203]: V['f0']
Out[203]:
array([[[[0],
[0],
[0]],
....
[[1],
[1],
[1]]]])
In [204]: _.shape
Out[204]: (2, 2, 3, 1)
This use of a compound dtype with fromiter
is also described in https://stackoverflow.com/a/12473478/901925
This is, in effect, a variation on the usual way of building a structured array - from a list of tuples. More than once I've use the expression:
np.array([tuple(x) for x in something], dtype=dt)
In sum we can time two methods of creating 2 arrays:
def foo1(N):
g = mk_array(N)
L, ab = np.empty((N,2,3,1),int), np.empty((N,2,3,2),float)
for i,v in enumerate(g):
L[i,...], ab[i,...] = v
return L, ab
def foo2(N):
dt=np.dtype([('f0',int,(2,3,1)),('f1',float,(2,3,2))])
g = mk_array(N)
V=np.fromiter(g, dtype=dt)
return V['f0'], V['f1']
For a wide range of N
these 2 functions take nearly the same time. I have to push run times to 1s before I starting an advantage for foo1
.