I want to train a neural network, I work with Python (3.6.9) and Tensorflow (2.4.0) and my problem is that my dataset is too big to be stored in memory.
A bit of context :
- My network takes in input a small complex matrix of dimension 64 by 32.
- My dataset is stored in the form of a very large ".mat" file generated by a matlab code.
- In the mat file, the samples are stored in a large cell array.
- I use the h5py library to open the mat file.
Example of python code to load only one sample from the file :
f = h5py.File('dataset.mat', 'r')
refs = f['data'] # array of reference of each sample
sample = f[refs[0]][()].view(np.complex) # load the first sample
Currently, I load only a small part of the dataset that I store in a tensorflow dataset (ds = tf.data.Dataset.from_tensor_slices(datas)
).
I would like to take advantage of the possibility offered by the h5py library to be able to load each example individually to load the examples on the fly during network training.
I tried the following approach:
f = h5py.File('dataset.mat', 'r')
refs = f['data'] # array of reference of each sample
ds_index = tf.data.Dataset.range(len(refs))
ds = ds_index.map(lambda i : f[refs[i]][()].view(np.complex))
but, I have the following error :
NotImplementedError: in user code:
<ipython-input-66-6cf802c8359a>:15 __call__ *
return self._f[self._rs[i]]['channel'][()].view(np.complex).astype(np.complex64).T
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:855 __array__
" a NumPy call, which is not supported".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (args_0:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
Do you know how to fix this error or can it be a better way to load examples on the fly ?