0

I am running a convolutional neural network on input set that is 5GB, and training output is same size, so total of 10GB of data. I am reserving about 50GB of memory, but still getting memory issues. I am using adam optimizer, my model fitting looks like this:

cnn_model.fit(x_train,y_train, validation_data=(x_test,y_test), callbacks=[earlystopper], epochs=25)

Any idea how I can improve the situation? Someone here https://github.com/tensorflow/tensorflow/issues/18736 says "Adam and RMSProp are problematic because they memorize historical gradients", any ideas how to address that?

Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76
  • Have you counted the memory used up by intermediate layers in both forward and backward propagation? – Autonomous Aug 15 '18 at 23:58
  • No, how would I do that, and how would I use that info? Please explain – Baron Yugovich Aug 16 '18 at 00:10
  • Let;s say you take input a `224 x 224 x 3` image and pass it through a `3 x 3 x 64` conv layer (with zero-padding to maintain the size) followed by a `2 x 2` maxpool, the output of these two layers will be `112 x 112 x 64`. Now, if you have 32 images, the output of the max-pool layer will occupy `32 x 112 x 112 x 64 = 25690112` bytes ~ 25 MB. You have to account for all such intermediate outputs. Use [this](https://stackoverflow.com/questions/43137288/how-to-determine-needed-memory-of-keras-model) to determine memory used since it can get quite complicated. You can reduce batch size if needed. – Autonomous Aug 16 '18 at 00:17
  • Are there any flags I can pass to minimize storing of intermediate data, or saving previous iteration's state whenever possible? – Baron Yugovich Aug 16 '18 at 00:19
  • I think we need more information to help you, like which model are you training, the size and number of images (not total size as you gave), and training parameters. Are you training on CPU or GPU? Please also include the real unedited out of memory errors. – Dr. Snoopy Aug 16 '18 at 00:33
  • I am using CPU, training on 1000 images that are 2048x2048, total size 5 GB. – Baron Yugovich Aug 16 '18 at 00:44
  • What is the batch size and how much CPU ram do you have? Also, how much memory is being used if you set ?`batch_size = 1` – Autonomous Aug 16 '18 at 02:45

1 Answers1

0

You don't have to load the whole dataset at once! I would create a flow generator for training and testing. Something like this: https://www.kaggle.com/humananalog/keras-generator-for-reading-directly-from-bson

André Guerra
  • 486
  • 7
  • 22