0

I want to run an implementation of the YOLO algorithm (object detection) with Keras. The code I use come mostly from here.

I am trying to train my model with a sample of the Open Image Dataset V4 from Google. The problem is that, when I try to train my model, I get the following warnings and exception:

W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 831.81MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 380.25MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 84.50MiB.  Current allocation summary follows.
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[8,64,208,208] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node conv2d_3/Conv2D}} = Conv2D[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/conv2d_3/Conv2D_grad/Conv2DBackpropInput"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/Conv2D/ReadVariableOp)]]

(Here I am using the tensorflow-GPU lib, but I have a similar error without the non-GPU tensorflow.)

At first I though it was because of the size of my dataset (200.000 pictures => ~60GB), but when running the code with a minimal sample (500 pictures => ~150MB), I get exactly the same error. So I guess there is a problem with my code.

Here is a minimal example of the problematic part (I guess) :

def _main():

    input_shape = [416,416]
    model = ### #Create YOLO model
    anchors = ### #Collection of 9 anchors
    num_classes = 601
    train_data = ### # A collection of the form [PathToImage, X1,X2,Y1,Y2, class], where the X,Y values define the bounding box 
    valid_data = ### # A collection of the form [PathToImage, X1,X2,Y1,Y2, class], where the X,Y values define the bounding box
    batch_size = 8

    model.fit_generator(data_generator(train_data, batch_size, input_shape, anchors, num_classes),
            steps_per_epoch=max(1, len(train_data)//batch_size),
            validation_data=data_generator(valid_data, batch_size, input_shape, anchors, num_classes),
            validation_steps=max(1, len(valid_data)//batch_size),
            epochs=50,
            initial_epoch=0)

    # Unfreeze and continue training, to fine-tune.
    for i in range(len(model.layers)):
        model.layers[i].trainable = True
    model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change
    print('Unfreeze all of the layers.')

    print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
    model.fit_generator(data_generator(train_data, batch_size, input_shape, anchors, num_classes),
        steps_per_epoch=max(1, len(train_data)//batch_size),
        validation_data=data_generator(valid_data, batch_size, input_shape, anchors, num_classes),
        validation_steps=max(1, len(valid_data)//batch_size),
        epochs=100,
        initial_epoch=50)

def data_generator(lines, batch_size, input_shape, anchors, num_classes):
    '''data generator for fit_generator'''
    n = len(lines)
    i = 0
    while True:
        image_data = []
        box_data = []
        for b in range(batch_size):
            if i==0:
                np.random.shuffle(lines)
            image, box = get_data(lines[i], input_shape) # Retrieve the image from path and return it with the bounding box (the object class is in box object)
            image_data.append(image)
            box_data.append(box)
            i = (i+1) % n
        image_data = np.array(image_data)
        box_data = np.array(box_data)
        y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes) # For each boxes, find the best anchor
        yield [image_data, *y_true], np.zeros(batch_size)

The OOM exception is raised on the second call to fit_generator()

Following answer on similar question, I added the gpu_options allow_growth ont my TensorFlow session :

K.clear_session() # get a new session

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

K.set_session(sess)

But id does not solved the problem.

So I am a bit stuck here. What am I doing wrong ?

Notes :

  • I have a Quadro P1000 GPU with 20GB GPU Memory (according to the Windows task manager)
  • I have 32GB RAM
  • I haven't changed the model architecture, you can find it here
Nakeuh
  • 1,757
  • 3
  • 26
  • 65
  • 2
    Use smaller batch size, this can easily give you OOM if your GPU memory is not enough for batchsize – FindOutIslamNow Apr 18 '19 at 11:36
  • I actually have a batch size of 8. Isn't it low enough ? – Nakeuh Apr 18 '19 at 11:37
  • Possible duplicate of [Tensorflow: ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure](https://stackoverflow.com/questions/45625691/tensorflow-ran-out-of-memory-trying-to-allocate-3-90gib-the-caller-indicates-t) – nickyfot Apr 18 '19 at 11:40
  • Check that no other kernal is using GPU, even restart your machine to make sure the GPU is available at its entire memory. (maybe some leak) – FindOutIslamNow Apr 18 '19 at 11:41
  • I actualy misread the Window task manager informations (I should have read the 'dedicated memory' part). So I actually have 4GB of GPU Memory. After more tests, it look like I can train my model using either GPU with a batch size of 1, or CPU with a batch size of 8. (The GPU is still faster). So I guess the model I want to train is to complex compared to my resources. :/ – Nakeuh Apr 18 '19 at 12:13

0 Answers0