-1

I have this peace of code and I would like to make it as fast as possible.

I am not an experienced c++ developer so I would love to know if you guys come up with some really good reinplementation of this algorithm since I removed all the assignments thinking it was a good thing to do...

And now I don't really know if that was the best thing to do.

So, what is faster?

for(register uint pPos = 0; pPos < size; pPos++) {
    img->setPixel(pPos % dst_w, pPos / dst_w,
                  buffer32[
                  sf * (
                    (pPos / dst_w * src_w) +
                    (pPos % dst_w)
                  )
            ]);
}

or

for(register uint pPos = 0, x = 0, y = 0; pPos < size; pPos++) {
    x = pPos % dst_w;
    y = pPos / dst_w;
    img->setPixel(x, y,
                  buffer32[
                  sf * (
                    (y * src_w) + x
                  )
            ]);
}

Side note: I really thought it was a good thing to ask, I don't understand the down votes.

Also thank you all for the comments, learned a lot.

Rafael Fontes
  • 1,195
  • 11
  • 19
  • 9
    Chances are good that compiler optimization will make them both equal (but in theory without any optimization, the second code is probably faster because 2x division and modulo is very slow compared to 1x and assignments. "Probably" because it depends on so many things...) – deviantfan Mar 27 '15 at 19:19
  • 5
    Side note: register is obsolete –  Mar 27 '15 at 19:19
  • 5
    You are writing in C++, not in assembler, don't try to outsmart your compiler by using `register` or moving simple arithmetics operations around. – Kos Mar 27 '15 at 19:20
  • 5
    Measure it. Do not guess(if it really matters that much. If it doesn't, go for the most readable version). – kraskevich Mar 27 '15 at 19:20
  • 1
    It's unusual to declare local variables that aren't used in the looping logic inside the for loop. – Neil Kirk Mar 27 '15 at 19:21
  • 1
    I suspect the down votes are because experienced developers get frustrated with "which of these nearly identical code bits is faster" because it is evidence of a bad antipattern in which less experienced developers write really bad code based on mistaken notions of what is "faster". – Gort the Robot Mar 27 '15 at 19:47

3 Answers3

2

Which is faster depends entirely on the compiler and in fact most optimizing compilers will essentially turn your second version into your first during compilation anyway. Even if they don't, on a modern computer, a division operation is only going to take a few nanoseconds so unless you are doing that operation millions or billions of times, it probably just doesn't matter in your application.

In all cases like this, the answer is:

  1. Don't worry about it unless something is slower than you'd like
  2. If something is slower than you'd like, use a profiler to figure out exactly what is causing the issue.

EDIT

I just gave it a shot myself with this compiler:

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.1.0
Thread model: posix

With this compiler, your code compiles to the exact same ASM output with g++ -S -O1. So the most basic compiler optimizations makes this question moot. If you want to make it as fast as possible, use your compiler's optimization flags and stop worrying about the code.

Gort the Robot
  • 2,329
  • 16
  • 21
  • Should I mention the value of the "size" variable is the ammount of pixels of an image? and also this peace of code is called once every 33.3 miliseconds? – Rafael Fontes Mar 27 '15 at 19:26
  • 1
    @RafaelLucio I can tell you in my opinion, there will be no observable difference in speed between the two code samples, once every 33.3 milliseconds or 3 milliseconds. – Neil Kirk Mar 27 '15 at 19:28
  • A modern CPU will do on the order of 100 billion instructions per second. A 1080p image has 2 million pixels. So adding two operations would add about 20 microseconds per frame. That is less than 0.1% of your time limit. So by just a back-of-the-envelope calculation, it is pointless to worry about. Of course, as I mentioned, any decent compiler will optimize the code you gave such that there will be no difference at all, even the inconsequential one I just described. – Gort the Robot Mar 27 '15 at 19:38
1

The first is slower because you are doing the same calculation multiple times. Though the time it would probably take is negligible. You would not notice any improvements unless your code does these operations thousand's of times or in fact many more. More over your compiler could optimize the code to do it anyway!

Use a profiler, there are many free ones out there. You will get a very goog insight of the time operations/functions take.

Check this General C++ Performance Improvement Tips and many such links for standard practices that will help you improve your coding standards.

Community
  • 1
  • 1
1

The lowest level, simple assignments are faster than multiplications or additions.

Some processors have instructions that can perform multiplication or addition and assignment in one instruction.

Stepping back a level, assignments and arithmetic operations between registers is faster than performing the operations with memory. Accessing cache is usually faster than on-chip memory. The further the data is from the processor core, the slower the access. Memory outside the chip would be slower to access than memory on the same piece of silicon containing the processor.

The Implications of Faster
So we know which operations are faster. The often overlooked questions are:

  • How much faster?
  • How much time is gained?

Let us take a hypothetical processor:

  • Assignments cost 20 nanoseconds.
  • Additions cost 50 nanoseconds.
  • Multiplications cost 100 nanoseconds.

So the "savings" between an addition operation and an assignment is 30 nanoseconds. The savings between a multiply operation and addition is 50 nanoseconds. Remember that Users cannot distinguish anything smaller than 1E-2 seconds. So, how many iterations will it take to make 50 nanoseconds noticeable?

With modern processors, a plethora of iterations must be performed in order to gain significant time from one of these instruction level changes. So the return on investment (the time it takes you to optimize these instructions) is not worthwhile. The ROI is only high when the program performance impacts sales or requirements (such as critical systems).

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • That's exactly what I was looking for... thank you! My goal is kind of a critical system that needs high performance with lowest cpu usage, also balancing quality. – Rafael Fontes Mar 28 '15 at 00:57
  • You should haul out the assembly language specification and data sheet that shows clock cycles per instruction. – Thomas Matthews Mar 28 '15 at 18:57
  • 2
    You are making the assumptions about what processor instructions a C++ assignment is translated to. There is not a lot of justification to make those assumptions. You can't make statements like "Assignments cost 20 nanoseconds` because a particular statement may be optimized away, or the local may be put in a register, or who knows what else. A C++ assignment could be a NO-OP or a significant time suck depending on the particular processor and particular compiler. – Gort the Robot Mar 29 '15 at 00:28
  • Note that when you generate an asm file using `g++ -S` with no other flags, the default optimizations will generate the *exact same code* with a local pulled out like that. So talking about assembly while ignoring the compiler is massively misleading. – Gort the Robot Mar 29 '15 at 00:36
  • Note: From the low level perspective from the processor, assignment operations are faster than arithmetic operations. Addition operations are faster than multiplication operations. Regardless of how the compiler translates the code. Yes, the compiler may eliminate code, or translate code to use registers rather than memory. My answer is based on the low level perspective as I stated in my first paragraph. – Thomas Matthews Mar 30 '15 at 16:11
  • Assignment is an operation between variables. Variables only exist in high-level source code and may or may not be reflected in emitted machine code. There is no such thing as a low-level look on assignment. In all, an answer that makes you wonder, how uninformed one can be after decades in business. – IInspectable Nov 27 '15 at 21:21
  • @IInspectable: Yep, no low level look on instructions that move from memory (variable) to a register or move a constant from an area in memory to a register then from a register into the memory (variable). Makes you wonder how uniformed one can be about the actually assembly language and microcode required to perform an assignment, whether high level language or low level language. I guess one can't assign values to registers or assign registers to ALU units. It's interesting how one can lose the perspective of how a computer actually works with today's tools. – Thomas Matthews Nov 28 '15 at 05:55