3

I am doing hyperparameter optimization using Bayesian Optimization in Tensorflow for my Convolutional Neural Network (CNN). And I am getting this error:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4136,1,180,432] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

I optimized these hyperparameters:

dim_batch_size = Integer(low=1, high=50, name='batch_size')
dim_kernel_size1 = Integer(low=1, high=75, name='kernel_size1')
dim_kernel_size2 = Integer(low=1, high=50, name='kernel_size2')
dim_depth = Integer(low=1, high=100, name='depth')
dim_num_hidden = Integer(low=5, high=1500, name='num_hidden')
dim_num_dense_layers = Integer(low=1, high=5, name='num_dense_layers')
dim_learning_rate = Real(low=1e-6, high=1e-2, prior='log-uniform',
                         name='learning_rate')
dim_activation = Categorical(categories=['relu', 'sigmoid'],
                             name='activation')
dim_max_pool = Integer(low=1, high=100, name='max_pool')

dimensions = [dim_batch_size,
              dim_kernel_size1,
              dim_kernel_size2,
              dim_depth,
              dim_num_hidden,
              dim_num_dense_layers,
              dim_learning_rate,
              dim_activation,
              dim_max_pool]

It says resource is exhausted. Why is this?

Is it because I optimized too many hyperparameters? Or there is some dimension mismatch? Or did I assign a hyperparameter range that is beyond the allowed range for correct operation?

Maxim
  • 52,561
  • 27
  • 155
  • 209
Chaine
  • 1,368
  • 4
  • 18
  • 37
  • Sorry I dont have an answer to your question but I was wondering if you wouldnt mind sharing your full code. I am trying to do the same thing but trying to follow what was done https://github.com/Vooban/Hyperopt-Keras-CNN-CIFAR-100/blob/master/neural_net.py but I would love to see other examples – MikeDoho Nov 01 '18 at 22:11
  • @MikeDoho did you manage to make the hyperopt work? – Suleka_28 Jan 17 '19 at 04:53
  • @Suleka_28 I was actually! If you are asking because you wanted to help then thank you! If you were like me and wanted another example then let me know – MikeDoho Jan 18 '19 at 06:22

1 Answers1

3

OOM happened because when several hyperparameters are in high end of the range, the model becomes too large. For example, suppose the batch size is around 50 and dim_num_hidden is around 1500 and so on. The number of hyperparameters doesn't matter, only a few of them can be enough to blow the model.

The concrete tensor from the error message is [4136,1,180,432] or 1.2Gb if you're using 32-bit floats for parameters. This is a lot and it's just one of many tensors needed to do the NN training (e.g., forward and backward double the number of parameters and hence the memory). No wonder tensorflow failed with OOM.

One particular issue with Bayesian optimization for hyper-parameter tuning is that the algorithm is very likely to select the corners of the hyper-space, i.e., one point where the values are close to a minimum in a range, the other point where the values are close to a maximum in a range. See the details in this question. This means that you have to limit at least one hyper-parameter (usually the batch size) so that the model fits even when everything else is at max. Or you can be smart and calculate it precisely before running the model on each iteration, but this way the algorithm won't optimize the batch size.

Maxim
  • 52,561
  • 27
  • 155
  • 209