I have a dataset that is too large to load all of it into memory. So instead, my thought is to load half the dataset, train on 2 epochs, then delete that data and load the other half, train, and repeat. However, even though I delete the data after each 2 epochs, it still crashes due to it being out of RAM.
I have tried training on each half of the dataset individually and it works. But when I try to create a loop that trains on half, deletes that half, then trains on the other half, it crashes. Keep in mind, it's not the GPU RAM, it's the system RAM I'm running out of.
here is the loop:
for i in range(20):
if i % 2 == 0:
X_train = np.load('/content/drive/My Drive/Kaggle ISLR/X_train_batch1.npy')
y_train = np.load('/content/drive/My Drive/Kaggle ISLR/y_train_batch1.npy')
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
train_gen = MyDataGenerator(X_train, y_train, batch_size, first_dim)
val_gen = MyDataGenerator(X_val, y_val, batch_size, first_dim)
del X_train
del y_train
model.fit(
train_gen,
validation_data=val_gen,
epochs=2,
callbacks=[checkpoint_callback]
)
del train_gen
del val_gen
else:
X_train = np.load('/content/drive/My Drive/Kaggle ISLR/X_train_batch2.npy')
y_train = np.load('/content/drive/My Drive/Kaggle ISLR/y_train_batch2.npy')
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
train_gen = MyDataGenerator(X_train, y_train, batch_size, first_dim)
val_gen = MyDataGenerator(X_val, y_val, batch_size, first_dim)
del X_train
del y_train
model.fit(
train_gen,
validation_data=val_gen,
epochs=2,
callbacks=[checkpoint_callback]
)
del train_gen
del val_gen
Is there a better way to do this to prevent running out of RAM?