I have 60 GB of .npy files spread across 20 files. I want to build a neural net in tensorflow
to learn on this data.
I plan to train on 19 files in order to test on 1 file. Each file has roughly 80 columns of x
data and 1 column of categorical y
data. Data types are np.float64
and np.int64
. I cannot reduce the data types to smaller sizes because I will lose valuable data in rounding errors.
I have no trouble loading the data into my neural net when I load a single file, but I am having trouble with training because I need to learn across all of the data. I cannot learn on the files in sequential order (for example, train on files 1-19 in order 1, 2, 3, ..., 19). I need to somehow shuffle all of the data for each epoch.
I've read posts like this one which looks almost identical to my question. However, my question is different beacuse I need to shuffle across multiple files. I have not seen a question like this answered on stackoverflow.