I use arrays stored in Python .npz format. I have a lot of these files, which all share the same common structure: filename my_file_var1_var2_var3.npz
contains the following items (all arrays are 32bit Floats):
- a 2D array (N=11, Ns=2000)
- a 2D array (12, N )
- a 2D array ( 300, N )
- a 2D array ( 300, Ns )
- a float
- an integer
It's quite annoying to have in excess of 1000 files, and each of them ends up taking some 4Mb. I was thinking that it would be good to shift them to a container, like HDF5/Pytables or similar. The different arrays are just arrays, there's not preferential ordering or anything (they are effectively matrices or stacks of vectors that will be operated on). All the arrays for each filename are required together simultaneously.
Are there any recommendations on what formats would be better to retrieve the arrays associated with var1
, var2
and var3
, that is portable and efficient with storage