Questions tagged [flops]

FLOPS (FLoating point Operations Per Second): a unit of measurement used to quantify the performance of the implementation of a numerical algorithm.

Anything related to the FLOPS unit of measurement (FLoating point Operations Per Second), i.e. a unit of measurement used to quantify the performance of the implementation of a numerical algorithm.

See Wikipedia page on FLOPS.

132 questions
696
votes
4 answers

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE add and five cycles for a mul to complete on most…
user1059432
  • 7,518
  • 3
  • 19
  • 16
60
votes
2 answers

FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2

I'm confused on how many flops per cycle per core can be done with Sandy-Bridge and Haswell. As I understand it with SSE it should be 4 flops per cycle per core for SSE and 8 flops per cycle per core for AVX/AVX2. This seems to be verified…
user2088790
52
votes
9 answers

What is FLOP/s and is it a good measure of performance?

I've been asked to measure the performance of a fortran program that solves differential equations on a multi-CPU system. My employer insists that I measure FLOP/s (Floating operations per second) and compare the results with benchmarks (LINPACK)…
caglarozdag
  • 699
  • 1
  • 8
  • 13
43
votes
3 answers

What is FLOPS in field of deep learning?

What is FLOPS in field of deep learning? Why we don't use the term just FLO? We use the term FLOPS to measure the number of operations of a frozen deep learning network. Following Wikipedia, FLOPS = floating point operations per second. When we test…
ladofa
  • 561
  • 1
  • 5
  • 14
31
votes
6 answers

What's the relative speed of floating point add vs. floating point multiply

A decade or two ago, it was worthwhile to write numerical code to avoid using multiplies and divides and use addition and subtraction instead. A good example is using forward differences to evaluate a polynomial curve instead of computing the…
J. Peterson
  • 1,996
  • 1
  • 24
  • 21
17
votes
4 answers

how to calculate a Mobilenet FLOPs in Keras

run_meta = tf.RunMetadata() enter codwith tf.Session(graph=tf.Graph()) as sess: K.set_session(sess) with tf.device('/cpu:0'): base_model = MobileNet(alpha=1, weights=None, input_tensor=tf.placeholder('float32', shape=(1,224,224,3))) opts…
Y. Han
  • 171
  • 1
  • 1
  • 4
14
votes
7 answers

How to compare performance of two pieces of codes

I have a friendly competition with couple of guys in the field of programming and recently we have become so interested in writing efficient code. Our challenge was to try to optimize the code (in sense of cpu time and complexity) at any cost…
Pouya
  • 1,266
  • 3
  • 18
  • 44
12
votes
1 answer

Counting the number of multiply-add operations (MAC) in Caffe CNN's architecture

Lately I've been benchmarking some CNNs regarding time, # of multiply-add operations (MAC), # of parameters and model size. I have seen some similar SO questions (here and here) and in the latter, they suggest using Netscope CNN Analyzer. This tool…
rafaspadilha
  • 629
  • 6
  • 20
12
votes
3 answers

How many FLOPs does tanh need?

I would like to compute how many flops each layer of LeNet-5 (paper) needs. Some papers give FLOPs for other architectures in total (1, 2, 3) However, those papers don't give details on how to compute the number of FLOPs and I have no idea how many…
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
10
votes
5 answers

How to measure FLOPS

How do I measure FLOPS or IOPS? If I do measure time for ordinary floating point addition / multiplication , is it equivalent to FLOPS?
Madhumitha B
  • 153
  • 1
  • 2
  • 12
10
votes
2 answers

Determine FLOPS of our ASM program

We had to implement an ASM program for multiplying sparse matrices in the coordinate scheme format (COOS) as well as in the compressed row format (CSR). Now that we have implemented all these algorithms we want to know how much more performant they…
tzwickl
  • 1,341
  • 2
  • 15
  • 31
10
votes
1 answer

floating point operations per cycle - intel

I have been looking for quite a while and cannot seem to find an official/conclusive figure quoting the number of single precision floating point operations/clock cycle that an Intel Xeon quadcore can complete. I have an Intel Xeon quadcore E5530…
user3495341
  • 133
  • 1
  • 1
  • 8
9
votes
3 answers

Do RFID tags have a processor?

Do RFID tags have a "real" processor capable of simple computations? If so, what is the processing power of nowadays RFID processors?
qertoip
  • 1,870
  • 1
  • 17
  • 29
8
votes
5 answers

What counts as a flop?

Say I have a C program that in pseudoish is: For i=0 to 10 x++ a=2+x*5 next Is the number of FLOPs for this (1 [x++] + 1 [x*5] + 1 [2+(x+5))] * 10[loop], for 30 FLOPS? I am having trouble understanding what a flop is. Note the [...] are…
Joshua Enfield
  • 17,642
  • 10
  • 51
  • 98
7
votes
2 answers

Why are math libraries often compared by FLOPS?

Math libraries are very often compared based on FLOPS. What information is being conveyed to me when I'm shown a plot of FLOPS vs size with sets of points for several different math libraries? FLOPS as a measure of performance would make more sense…
Praxeolitic
  • 22,455
  • 16
  • 75
  • 126
1
2 3
8 9