I have thousands of binary files that I have to read and store in memory to work on the data. I already have a function that permit to read those data, but I would like to improve it, because it is kind of slow.
The data are organize this way :
- 1000 cubes.
- each cube is written in 10 binary files.
For the moment I have a reading function that can read and return ONE cube in a numpy array (read_1_cube). Then I loop over all the file to extract all the cube and I concatenate them.
def read_1_cube( dataNum ):
### read the 10 subfiles and concatenate arrays
N_subfiles = 10
fames_subfiles = ( '%d_%d'%(dataNum,k) for k in range(N_subfiles) )
return np.concatenate( [np.fromfile( open(fn,'rb'), dtype=float, count=N*N*N ).reshape((N,N,N)) for fn in fames_subfiles], axis=2 )
TotDataNum = 1000
my_full_data = np.concatenate( [read_1_cube( d ) for d in range( TotDataNum )], axis=0 )
I try to work with generators to limit the amount of memory used. With those function it took ~2.5s per file, so 45min hour for the 1000 file, in the end I will have 10000 files, so it is not doable (of course, I will not read the 10000 files a t onces, but steel I can not work if it take 1h for 1000 files).
My questions:
- do you know a way to optimize the read_1_cube and the generation of my_full_data ?
- your do see a better way (without the read_1_cube) ?
- An other optimization way: do you know if there is a concatenate function that can work on a generator (like sum(), min(), max(), list()... )?
Edit: Following the comment of @liborm about np.concatenate I find other equivalent functions (stack concatenate question): np.r_, np.stack, np.hstack
. The good point is that stack can take a generator in input. So I push as far as possible with generator, to create the actual data array only at the end.
def read_1_cube( dataNum ):
### read the 10 subfiles and retur cube generator
N_subfiles = 10
fames_subfiles = ( '%d_%d'%(dataNum,k) for k in range(N_subfiles) )
return (np.fromfile( open(fn,'rb'), dtype=float, count=N*N*N ).reshape((N,N,N)) for fn in fames_subfiles)
def read_N_cube( datanum ):
### make a generator of 'cube generator'
C = ( np.stack( read_1_cube( d ), axis=2 ).reshape((N,N,N*10)) for d in range(datanum) )
return np.stack( C ).reshape( (datanum*N,N,N*N_subfiles) )
### The full allocation is done here, just once
my_full_data = read_N_cube( datanum )
It is quicker than the first version, where the first version needed 2.4s to read 1 file, the second take 6.2 to read 10 files!
I think that there are not so much place for optimization, but I am sure that there is still a better algorithm out there!