I have an MNIST
like dataset that does not fit in memory, (process memory, not gpu memory).
My dataset is 4GB.
This is not a TFLearn
issue.
As far as I know model.fit
requires an array for x
and y
.
TFLearn example:
model.fit(x, y, n_epoch=10, validation_set=(val_x, val_y))
I was wondering is there's a way where we can pass a "batch iterator", instead of an array. Basically for each batch I would load the necessary data from disk.
This way I would not run into process memory overflow errors.
EDIT
np.memmap
could be an option. But I don't see how to skip the first few bytes that compose the header.