I want to use an numpy file (.npy) from Google Drive into Google Colab without importing it into the RAM.
I am working on Image Classification and have my image data into four numpy files in Google Drive. The collective size of the files is greater than 14 GB. Whereas Google Colab only offers 12 GB RAM for usage. Is there a way through which I can use it by loading only single batch at a time into the ram to train the model and removing it from the ram (maybe similar to flow_from_directory)?
The problem using flow_from_directory is that it is very slow even for one block of VGG16 even if I have images in Colab directory.
I am using Cats vs Dogs Classifier dataset from Kaggle.
! kaggle competitions download -c 'dogs-vs-cats'
I converted the image data into numpy array, and saved it in 4 files:
X_train - float32 - 10.62GB - (18941, 224, 224, 3)
X_test - float32 - 3.4GB - (6059, 224, 224, 3)
Y_train - float64 - 148KB - (18941)
Y_test - float64 - 47KB - (6059)
When I run the following code, the session crashes showing 'Your session crashed after using all available RAM.' error.
import numpy as np
X_train = np.load('Cat_Dog_Classifier/X_train.npy')
Y_train = np.load('Cat_Dog_Classifier/Y_train.npy')
X_test = np.load('Cat_Dog_Classifier/X_test.npy')
Y_test = np.load('Cat_Dog_Classifier/Y_test.npy')
Is there any way to use these 4 files without loading it into the RAM?