l have a dataset of 40,000 examples dataset=(40.000,2048)
. After a process l would like to store and load dataset efficiently. Dataset is in an numpy format
l used pickle to store this dataset
but it takes time to store and more time to load it. I even get memory error.
l tried to split the dataset
into several sample as follow :
with open('dataset_10000.sav', 'wb') as handle:
pickle.dump(train_frames[:10000], handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('dataset_20000.sav', 'wb') as handle:
pickle.dump(train_frames[10000:20000], handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('dataset_30000.sav', 'wb') as handle:
pickle.dump(train_frames[20000:30000], handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('dataset_35000.sav', 'wb') as handle:
pickle.dump(train_frames[30000:35000], handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('dataset_40000.sav', 'wb') as handle:
pickle.dump(train_frames[35000:], handle, protocol=pickle.HIGHEST_PROTOCOL)
However l get a memory error and its too heavy.
What is the best/optimized way to save/load such a huge data from/into disk ?