1

I'm on a big data optimization job. it's very time consuming process,

so i like to save operations as much as possible.

I remember it says something like " division takes much much more time than Addition ".

but is there a chart which could give me a general idea that how many units of time these operation takes.

such as:

Addition             2 units
subtraction          5 units
multiplication      20 units
division            30 units
greater than        10 units
equals to            1 units
Benny Ae
  • 1,897
  • 7
  • 25
  • 37
  • 1
    possible duplicate of [how to find CPU cycle for an assembly instruction](http://stackoverflow.com/questions/692718/how-to-find-cpu-cycle-for-an-assembly-instruction) – Iłya Bursov Jul 02 '14 at 23:45
  • I think it depends on both the data type (integer vs floating point) and the architecture you're using. And whether you do jumps based on the comparison or not, because jumps can mess up the pipeline. Also the mix of operations matter. In general I'd say addition, substraction, comparison, multiplication << division, jump. Also see http://stackoverflow.com/questions/1146455/whats-the-relative-speed-of-floating-point-add-vs-floating-point-multiply – Herman Jul 03 '14 at 00:02
  • 1
    Cache miss (to memory) more than 100 cycles ☹ With pipelining (and associated branch misprediction cost), superscalar execution (and SIMD)—it can take less time to do three independent adds than two dependent ones—, and out-of-order execution simple cycle counting is not practical. It helps to understand basic costs (which includes not just typical latency but throughput and variability [e.g., a correctly predicted branch can be nearly free but a misprediction can cost more than 10 cycles]), but do not expect a simple model to make accurate predictions. –  Jul 03 '14 at 02:07
  • It's not that simple, you usually need to talk about bandwidth, and consider the number of ALU units/ports available per operation, unless you have a critical chain of dependencies that is latency sensitive. – Leeor Jul 14 '14 at 22:03

1 Answers1

1

If you don't know for sure that your workload is execution-bound (i.e. that the bottleneck is the operations you're suggesting here), the first thing to do would be to establish that using a profiling tool like VTune, oprofile, gprof or perfmon. To elaborate on what Paul A. Clayton is saying, if your workload is "big data", then it's probably more sensitive to the effects of the memory hierarchy than to arithmetic performance. A profiling tool could tell you if this is the case and whether there's a specific part of the memory hierarchy where you should target your optimization efforts.

Aaron Altman
  • 1,705
  • 1
  • 14
  • 22