51

I am working with Keras 2.0.0 and I'd like to train a deep model with a huge amount of parameters on a GPU. Using too large images, I'm running out of memory (OOM). Using too low images, the model's accuracy will be worse than possible. Therefore I'd like to find the biggest possible input size of images that fit to my GPU. Is there any functionality calculating the memory (e.g. comparable to model.summary()) given the model and input data?

I appreciate your help.

D.Laupheimer
  • 1,074
  • 1
  • 9
  • 21
  • 3
    You can look at how they compute the memory usage [here] (http://cs231n.github.io/convolutional-networks/#case), you could also try to reduce the batch size instead of the resolution. – Merwann Selmani Mar 31 '17 at 09:55
  • 1
    Thanks for your answer. Actually, I was reading the given link before posting my question. But I wanted to avoid manual computing :D Also I don't want to reduce batch size as I want to have a good representation of my whole data set in a statistical sense. – D.Laupheimer Mar 31 '17 at 10:09
  • 4
    try-fail will be the fastest to answer your question. Keras isn't the computation library, it's only a wrapper around the backend you chose. The memory management is handled differently for different backends. The memory consumption will not only depend on the number of parameters, LSTM will use a lot of memory even if the number of parameters is low... You should just try and see the actual memory consumption :) – Nassim Ben Mar 31 '17 at 11:11
  • I was afraid of that. But I will do so...thanks! (Nassim, you like answering my questions, don't you? :D) – D.Laupheimer Mar 31 '17 at 11:53
  • love it :-) have fun – Nassim Ben Mar 31 '17 at 11:58

4 Answers4

66

I created a complete function based on the answer of Fabrício Pereira.

def get_model_memory_usage(batch_size, model):
    import numpy as np
    try:
        from keras import backend as K
    except:
        from tensorflow.keras import backend as K

    shapes_mem_count = 0
    internal_model_mem_count = 0
    for l in model.layers:
        layer_type = l.__class__.__name__
        if layer_type == 'Model':
            internal_model_mem_count += get_model_memory_usage(batch_size, l)
        single_layer_mem = 1
        out_shape = l.output_shape
        if type(out_shape) is list:
            out_shape = out_shape[0]
        for s in out_shape:
            if s is None:
                continue
            single_layer_mem *= s
        shapes_mem_count += single_layer_mem

    trainable_count = np.sum([K.count_params(p) for p in model.trainable_weights])
    non_trainable_count = np.sum([K.count_params(p) for p in model.non_trainable_weights])

    number_size = 4.0
    if K.floatx() == 'float16':
        number_size = 2.0
    if K.floatx() == 'float64':
        number_size = 8.0

    total_memory = number_size * (batch_size * shapes_mem_count + trainable_count + non_trainable_count)
    gbytes = np.round(total_memory / (1024.0 ** 3), 3) + internal_model_mem_count
    return gbytes

UPDATE 2019.10.06: Added support for models which contain other models as layers.

UPDATE 2020.07.17: Function now works correctly in TensorFlow v2.

Innat
  • 16,113
  • 6
  • 53
  • 101
ZFTurbo
  • 3,652
  • 3
  • 22
  • 27
  • the calculation makes sense, but for some reason, it seems to output memory usage far beyond what my GPU has, while Keras is happily training on it. E.g: get_model_memory_usage(batch_size, model) => 28GB, while my GTX 1060 has 6GB :) – Alon Burg Oct 16 '17 at 07:36
  • 1
    i wonder if it could be related to the loss function? which might require some memory ... – Alon Burg Oct 16 '17 at 08:18
  • 3
    Probably Theano or TensorFlow don't store all intermediate shapes in memory except 2 shapes which involved in calculation of current layer. So to find memory required by shapes we need to get 2 maximum consecutive shapes volume. – ZFTurbo Oct 16 '17 at 17:18
  • 8
    There's also memory needed for result of every layer and also gradients. So this is incorrect. – UpmostScarab Nov 06 '17 at 14:10
  • Gradients covered by "shapes_mem_count" part. I think, we don't really need to store intermediate results for layers. – ZFTurbo Nov 06 '17 at 20:47
  • 3
    Shouldn't it be : `total_memory = 4.0*( batch_size*shapes_mem_count + trainable_count + non_trainable_count )` weights are shared amongst all the batches. No matter the batch size, the weights will take up same amount of memory. Don't need to multiply weights by batchsize. – mkuse Nov 30 '18 at 05:12
  • Would be nice to include the `queue_size` for models using a batch generator. – b-fg Mar 25 '19 at 12:10
  • Note that if you are specifying your batch size in the model (by `batch_input_shape` or `batch_shape`) you shouldn't multiply by the batch size again for the calculation (passing 1 to the function will do). – timakro Jun 19 '19 at 08:53
  • Getting: (` trainable_count = np.sum([K.count_params(p) for p in set(model.trainable_weights)]) File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py", line 1089, in __hash__ raise TypeError("Variable is unhashable if Tensor equality is enabled. " `) – GuySoft Jul 07 '20 at 12:38
  • I made an update for TF2. Looks like it now works fine. – ZFTurbo Jul 17 '20 at 15:21
9

Hope this can help you...

  • Here is how determinate a number of shapes of you Keras model (var model), and each shape unit occupies 4 bytes in memory:

    shapes_count = int(numpy.sum([numpy.prod(numpy.array([s if isinstance(s, int) else 1 for s in l.output_shape])) for l in model.layers]))

    memory = shapes_count * 4

  • And here is how determinate a number of params of your Keras model (var model):

    from keras import backend as K

    trainable_count = int(numpy.sum([K.count_params(p) for p in set(model.trainable_weights)]))

    non_trainable_count = int(numpy.sum([K.count_params(p) for p in set(model.non_trainable_weights)]))

Innat
  • 16,113
  • 6
  • 53
  • 101
Fabrício Pereira
  • 1,521
  • 1
  • 12
  • 20
9

Here is my variant of @ZFTurbo's answer. It offers better handling for nested Keras models, different TensorFlow dtypes, and removes the dependency on NumPy. I've written and tested this on TensorFlow 2.3.0, and it may not work on earlier versions.

def keras_model_memory_usage_in_bytes(model, *, batch_size: int):
    """
    Return the estimated memory usage of a given Keras model in bytes.
    This includes the model weights and layers, but excludes the dataset.

    The model shapes are multipled by the batch size, but the weights are not.

    Args:
        model: A Keras model.
        batch_size: The batch size you intend to run the model with. If you
            have already specified the batch size in the model itself, then
            pass `1` as the argument here.
    Returns:
        An estimate of the Keras model's memory usage in bytes.

    """
    default_dtype = tf.keras.backend.floatx()
    shapes_mem_count = 0
    internal_model_mem_count = 0
    for layer in model.layers:
        if isinstance(layer, tf.keras.Model):
            internal_model_mem_count += keras_model_memory_usage_in_bytes(
                layer, batch_size=batch_size
            )
        single_layer_mem = tf.as_dtype(layer.dtype or default_dtype).size
        out_shape = layer.output_shape
        if isinstance(out_shape, list):
            out_shape = out_shape[0]
        for s in out_shape:
            if s is None:
                continue
            single_layer_mem *= s
        shapes_mem_count += single_layer_mem

    trainable_count = sum(
        [tf.keras.backend.count_params(p) for p in model.trainable_weights]
    )
    non_trainable_count = sum(
        [tf.keras.backend.count_params(p) for p in model.non_trainable_weights]
    )

    total_memory = (
        batch_size * shapes_mem_count
        + internal_model_mem_count
        + trainable_count
        + non_trainable_count
    )
    return total_memory

James Mishra
  • 4,249
  • 4
  • 30
  • 35
  • I have a question about internal_model_mem_count, it seems to be always 0? but what does this mean? why compute this? thank you – Jing Oct 04 '21 at 15:45
  • 1
    @Jing, it is possible to use a Keras model as an individual layer in a larger model. To account for this, `keras_model_memory_usage_in_bytes()` recursively calls itself to measure memory usage and tracks nested model memory usage in the `internal_model_mem_count` variable. – James Mishra Oct 06 '21 at 09:38
  • Doesn't seem that the calculation is correct - my basic UNET model (disk size of 1 GB) with batch size 1, this yields 101.594.448.001, which is 100 GB. It trains fine on a 16 GB RAM, or 12 GB NVidia. – illan Sep 27 '22 at 15:54
-1

I believe that if you use a data generator either custom written or leverage some existing generators from keras, it will resolve your issue. Memory error usually arises when all the loaded data becomes over bearing for the system, instead using a generator will break down the dataset into segments, that way you won't run out of memory and will be train on any system.

  • 1
    This attitude is incorrect. While data generators or small batch sizes can help reduce memory usage, it is already common for research-grade models to consume more memory than consumer-grade GPUs can offer. – James Mishra Oct 18 '20 at 12:19