I have a huge set of large boolean 3-dimensional arrays that I need to store. They contain both False
and True
, but for the purpose of illustration consider the following array with comparable shape as an example
bool_array = np.zeros((20000,20000,5)).astype(np.bool)
When I use
np.save('bool_array.npy', bool_array)
and
bool_array = np.load('bool_array.npy')
The resulting file is over 2 GB and loading times are slow (4 - 5 sec).
Note that bool_array is very sparse with any row in any of the five slices containing at most 100 True
.
What would be a more memory-efficient and faster alternative (file_format, accounting for sparsity, etc.) to save bool_array
?