@Paul Panzer shared an excellent answer on how to perform the cartesian product of a list of NumPy arrays efficiently. I have modified his cartesian_product_transpose_pp(arrays)
function to show the iteration process occurs from the left to right column of the returned array.
import numpy
import itertools
import time
def cartesian_product_transpose_pp(arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty((la, *map(len, arrays)), dtype=dtype)
idx = slice(None), *itertools.repeat(None, la)
for i, a in enumerate(arrays):
arr[i, ...] = a[idx[:i]] #my modification
return arr.reshape(la, -1).T
mumax = 18
mumin = 1
nsample = 8
mu_list = [ i for i in range(mumin, mumax+1, 1) ]
mu_array = np.array( mu_list, dtype=np.uint8 )
mu_alist = [ mu_array ] * nsample
start = time.time()
cartesian_product_transpose_pp( mu_alist )
end = time.time()
print( f'\ncartesian_product_transpose_pp Time: {(end - start)}sec' )
However, when this function's argument( i.e. arrays
) exceeds a certain size, it will require a very large arr
and fail due to MemoryError
. Example:
arr = np.empty( ( la, *map(len, arrays) ), dtype=dtype )
MemoryError: Unable to allocate 82.1 GiB for an array with shape (8, 18, 18, 18, 18, 18, 18, 18, 18) and data type uint8
To address this memory error, I would like to break arr
into smaller chunks so as to be able to yield smaller chunks of arr.reshape(la, -1).T
How do I do this when the value of nsample
increases?
Updated Test code that I am now using:
import numpy as np
import itertools
import time
import sys
def cartesian_product_transpose_pp( arrays):
la = len(arrays)
dtype = np.result_type(*arrays)
arr = np.empty((la, *map(len, arrays)), dtype=dtype)
idx = slice(None), *itertools.repeat(None, la)
for i, a in enumerate(arrays):
arr[i, ...] = a[idx[:i]]
return arr.reshape(la, -1).T
mumax = 18
mumin = 1
nsample = 9
mu_list = [ i for i in range(mumin, mumax+1, 1) ]
mu_array = np.array( mu_list, dtype=np.uint8 )
mu_alist = [ mu_array ] * nsample
a = mu_alist
start = time.time()
c = 1
result = (
cartesian_product_transpose_pp( [ *x[:,None], *a[c:] ] )
for x in cartesian_product_transpose_pp( a[:c] )
)
with np.printoptions(threshold=sys.maxsize):
for n, i in enumerate( result ):
#print( n, i ) #for debugging
a = i
end = time.time()
print( f'\ncartesian_product_transpose_pp Time: {(end - start)}' )
Error Msg:
arr = np.empty((la, *map(len, arrays)), dtype=dtype)
MemoryError: Unable to allocate 92.4 GiB for an array with shape (9, 1, 18, 18, 18, 18, 18, 18, 18, 18) and data type uint8