17

run_meta = tf.RunMetadata()
enter codwith tf.Session(graph=tf.Graph()) as sess:
K.set_session(sess)


with tf.device('/cpu:0'):
    base_model = MobileNet(alpha=1, weights=None, input_tensor=tf.placeholder('float32', shape=(1,224,224,3)))




opts = tf.profiler.ProfileOptionBuilder.float_operation()    
flops = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)

opts = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()    
params = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)

print("{:,} --- {:,}".format(flops.total_float_ops, params.total_parameters))

When I run above code, I got a below result

1,137,481,704 --- 4,253,864

This is different from the flops described in the paper.

mobilenet: https://arxiv.org/pdf/1704.04861.pdf

ShuffleNet: https://arxiv.org/pdf/1707.01083.pdf

How to calculate exact flops described in the paper?

Ioannis Nasios
  • 8,292
  • 4
  • 33
  • 55
Y. Han
  • 171
  • 1
  • 1
  • 4
  • You can use following pip package to get some basic information like model's memory requirement, no. of parameters, flops etc. https://pypi.org/project/model-profiler – Talha Ilyas Apr 08 '21 at 07:32

4 Answers4

17

tl;dr You've actually got the right answer! You are simply comparing flops with multiply accumulates (from the paper) and therefore need to divide by two.

If you're using Keras, then the code you listed is slightly over-complicating things...

Let model be any compiled Keras model. We can arrive at the flops of the model with the following code.

import tensorflow as tf
import keras.backend as K


def get_flops():
    run_meta = tf.RunMetadata()
    opts = tf.profiler.ProfileOptionBuilder.float_operation()

    # We use the Keras session graph in the call to the profiler.
    flops = tf.profiler.profile(graph=K.get_session().graph,
                                run_meta=run_meta, cmd='op', options=opts)

    return flops.total_float_ops  # Prints the "flops" of the model.


# .... Define your model here ....
# You need to have compiled your model before calling this.
print(get_flops())

However, when I look at my own example (not Mobilenet) that I did on my computer, the printed out total_float_ops was 2115 and I had the following results when I simply printed the flops variable:

[...]
Mul                      1.06k float_ops (100.00%, 49.98%)
Add                      1.06k float_ops (50.02%, 49.93%)
Sub                          2 float_ops (0.09%, 0.09%)

It's pretty clear that the total_float_ops property takes into consideration multiplication, addition and subtraction.

I then looked back at the MobileNets example, looking through the paper briefly, I found the implementation of MobileNet that is the default Keras implementation based on the number of parameters:

image

The first model in the table matches the result you have (4,253,864) and the Mult-Adds are approximately half of the flops result that you have. Therefore you have the correct answer, it's just you were mistaking flops for Mult-Adds (aka multiply accumulates or MACs).

If you want to compute the number of MACs you simply have to divide the result from the above code by two.


Important Notes

Keep the following in mind if you are trying to run the code sample:

  1. The code sample was written in 2018 and doesn't work with tensorflow version 2. See @driedler 's answer for a complete example of tensorflow version 2 compatibility.
  2. The code sample was originally meant to be run once on a compiled model... For a better example of using this in a way that does not have side effects (and can therefore be run multiple times on the same model), see @ch271828n 's answer.
Malcolm
  • 191
  • 1
  • 9
  • 6
    why do you need `model` as an argument for `get_flops`? – gizzmole Jun 13 '19 at 12:21
  • 1
    On Tensorflow 2.1.0 it gives an error: `AttributeError: module 'tensorflow' has no attribute 'RunMetadata'` – Qin Heyang Feb 08 '20 at 00:40
  • Number of MACs are not equal to `parameters/2 `. In the paper, you can see `parameters` are 4.2 Million and `Mult-Adds` are 569 Million. So this approach is wrong to calculate number of MACs. I am not sure about the right one too. – Awais Hussain Mar 13 '20 at 06:04
  • @gizzmole ...idk, lol. Maybe my original line of reasoning was that it has to be a compiled model when you call the function? @QinHeyang it's totally possible that they deprecated or deleted the module. In fact just browsing the docs and it's in `tf.compat.v1.RunMetadata` in `2.1.0`. Maybe there is a better alternative in the v2 documentation... If you find something I can edit my answer or you can answer it separately. @AwaisHussain it's not `parameters / 2`, if you re-read my response you'll see I don't claim that. – Malcolm Apr 08 '20 at 20:23
9

This is working for me in TF-2.1:

def get_flops(model_h5_path):
    session = tf.compat.v1.Session()
    graph = tf.compat.v1.get_default_graph()


    with graph.as_default():
        with session.as_default():
            model = tf.keras.models.load_model(model_h5_path)

            run_meta = tf.compat.v1.RunMetadata()
            opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()

            # Optional: save printed results to file
            # flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
            # opts['output'] = 'file:outfile={}'.format(flops_log_path)

            # We use the Keras session graph in the call to the profiler.
            flops = tf.compat.v1.profiler.profile(graph=graph,
                                                  run_meta=run_meta, cmd='op', options=opts)

            return flops.total_float_ops
driedler
  • 3,750
  • 33
  • 26
  • Thanks for your modification, when I run it on a model which has around 18 convolution layers, I get the following ==================Model Analysis Report====================== Profile: _TFProfRoot 0 float_ops (0.00%, 0.00%) ======================End of Report========================== something doesn't seem to right. my model summary prints Total params: 176,240 Trainable params: 176,240 Non-trainable params: 0 – abacusreader Mar 03 '20 at 02:38
  • I am not sure. Perhaps an issue with the TF 1.x profiler? Note that supposedly they're working on a TF 2.x profiler: https://github.com/tensorflow/tensorflow/issues/32809 – driedler Mar 04 '20 at 01:53
  • 1
    I got it working, the issue is I was trying to pass the model on the fly after it was generated and compiled. However this code works only if the model is saved to file and then reloaded. – abacusreader Mar 04 '20 at 04:30
  • 2
    Hi, it seems that **this cannot be run twice, otherwise the flops will accumulate**... I have provided a little improvement, see my answer :) – ch271828n May 02 '20 at 00:39
  • This does not produce the correct results; it's printing something a lot smaller than the actual FLOPS. – buped82 May 30 '22 at 18:23
7

The above solutions cannot be run twice, otherwise the flops will accumulate! (In other words, the second time you run it, you will get output = flops_of_1st_call + flops_of_2nd_call.) The following code calls reset_default_graph to avoid this.

def get_flops():
    session = tf.compat.v1.Session()
    graph = tf.compat.v1.get_default_graph()

    with graph.as_default():
        with session.as_default():
            model = keras.applications.mobilenet.MobileNet(
                    alpha=1, weights=None, input_tensor=tf.compat.v1.placeholder('float32', shape=(1, 224, 224, 3)))

            run_meta = tf.compat.v1.RunMetadata()
            opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()

            # Optional: save printed results to file
            # flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
            # opts['output'] = 'file:outfile={}'.format(flops_log_path)

            # We use the Keras session graph in the call to the profiler.
            flops = tf.compat.v1.profiler.profile(graph=graph,
                                                  run_meta=run_meta, cmd='op', options=opts)

    tf.compat.v1.reset_default_graph()

    return flops.total_float_ops

Modified from @driedler, thanks!

ch271828n
  • 15,854
  • 5
  • 53
  • 88
  • Can you add the part where you are getting `U.TimingManager` too? – Jash Shah Aug 19 '20 at 12:25
  • 1
    @JashShah That is nothing but timing how long does the line execute. So I remove it (safely). – ch271828n Aug 19 '20 at 12:34
  • Thanks for sharing. Is there a relationship between the input shape and the total flops number? When I changed the input shape, I get the same flops. – asendjasni Mar 09 '22 at 10:53
  • @asendjasni I guess there should. Maybe ask a separate question for your observation? – ch271828n Mar 10 '22 at 00:51
  • Btw, could you explain what the meaning of this output `6.95m float_ops (100.00%, 98.81%)`? Thanks. – asendjasni Mar 11 '22 at 10:33
  • This does not produce the correct results; it's printing something a lot smaller than the actual FLOPS. – buped82 May 30 '22 at 18:23
  • @asendjasni I think it's 6.95m float ops (100.0% [accumulated percent of FLOPS], 98.81% [percent of total flops]... You can see it better with the following sample: [...] Mul 1.06k float_ops (100.00%, 49.98%) Add 1.06k float_ops (50.02%, 49.93%) Sub 2 float_ops (0.09%, 0.09%) The bottom row has 0.09 in both sections of the parentheses, and the first value in the tupel increases row over row, accumulating what is below it. – Malcolm Aug 24 '22 at 04:51
  • @buped82 it's hard to know what to respond with without any clarification... this module is pretty lightly documented even in the core tensorflow library: https://www.tensorflow.org/api_docs/python/tf/compat/v1/profiler/profile You might have better luck opening an issue in the tensorflow repo if you suspect a bug? – Malcolm Aug 24 '22 at 04:54
  • @Malcolm thank you for the clarification. I may have a related question. I'm trying to calculate the # of FLOPs and the number of parameters, it appears that they also the same with a small margin. Do you have an idea why ? Besides, in many references, they use **G** as the unit of FLOPs, so why here I see an **M**? – asendjasni Aug 26 '22 at 13:30
  • @asendjasni I recommend you ask a separate question on stack overflow for the number of FLOPs vs number of parameters, as it is unclear what information you are missing, or what your question really is... this is also getting pretty off topic from the original question. The `G` vs the `M` is just the metric prefixes Mega vs Giga: https://en.wikipedia.org/wiki/Metric_prefix – Malcolm Aug 30 '22 at 02:52
  • This doesn't work for me, I'm using 2.8.0 with GPU support. `tensorflow/core/common_runtime/gpu/gpu_cudamallocasync_allocator.cc:390] Trying to set the stream twice. This isn't supported. Aborted (core dumped)` – darrahts Aug 09 '23 at 17:22
-13

You can use model.summary() on all Keras models to get number of FLOPS.

Pedram
  • 97
  • 3
  • 14