Intel C++ optimizer removes masm code

Question

I recently started using the Intel C++ compiler for some of my projects, while also learning masm assembly. I kept on hearing how it wasn't worth learning assembly since the compilers do a good job anyway of optimizing code, and so thought about having a look at which one was faster once and for all. To try and do so, I had the following c++ code:

#include <iostream>
#include <time.h>

using namespace std;

extern "C" {
int Add(int a, int b);
}


int main(int argc, char * argv[]){
        int startingTime = clock();
        for (int i = 0; i < 100; i++)
        {
            cout << "normal: " << i << endl;
            cout << 1000 + 1000 << endl;
        }
        int timeTaken1 = clock() - startingTime;

        startingTime = clock();
        for (int i = 0; i < 100; i++){
             cout << "assem" << i << endl;
             cout << Add(2000, 2000) << endl;
        }
        int timeTaken2 = clock() - startingTime;

        cout << "Time taken under normal addition: " << timeTaken1 << endl;
        cout << "Time taken under assembly addition: " << timeTaken2 << endl;

        cin.get();
        return 0;
   }

And the following masm code:

.model flat
.386

.code

    public _Add

_Add PROC
        push ebp            ;
        mov ebp, esp        ;
        mov eax, [ebp + 8]  ;
        mov ebx, [ebp + 12] ;
        add eax, ebx        ;
        leave               ; cleanup
        ret                 ;


_Add endp
end

I am using Visual Studio to compile this, using the Intel Composer plugin. When I run this under Debug mode, it works perfectly - I can see "normal 99" and "assem 99" along with the relevant number. When I run this with /0d specified for the compiler, then it also works fine. However, when /02, /0x or /03 are specified, it only shows the normal (i+j) addition loop and the first value of the assembler addition i.e. only assem 0 and 4000 are shown.

My guess is that the assembly code is being optimized out by the Intel Compiler (this works fine with the VC++ compiler), and am curious to find out why this is occurring and how it can be worked around, while still letting Intel optimize the C++ part.

Thanks SbSpider

EDIT: I know this is a late, but thanks for all of the replies. It seems that it was an error in the assembly code rather than the intel compiler not using the assembly code.

micro-benchmarks are tricky. If I see it right, you are trying to benchmark int-addition in C++ vs asm, and falling flat on your nose. FYI: The C++ code won't do any addition, that's optimized out, maybe even already at -O0. No idea how and whether the asm will be optimized though. — Deduplicator, Aug 17 '14 at 00:53
Should you not tell the compiler you are changing `eax` and `ebx`? Quite likely, the optimized code uses one of these to store `i` in. — Jongware, Aug 17 '14 at 01:03
By the way, as @Deduplicator already stated: `1000 + 1000` is such a clear candidate for "optimization" (in this case, constant folding) that it most likely will *never* appear as such in the generated code. Caveat #2: even if you replace these with variables, the native compiler will most likely keep them in registers and add these, where your own assembler function always first has to load them. — Jongware, Aug 17 '14 at 01:13
Doing any IO, like writing to stdout, will completely overshadow the length of time taken for computations like this. In addition, retrieving variables' values from memory and adding will probably take ~10 cycles (a few nanoseconds), while the resolution of the clock is probably 1000 to 1000000 times longer. — NicholasM, Aug 17 '14 at 06:44
The fact that you should not, for most day-to-day purposes, be *working* in assembly, does *not* mean that it isn't worth *learning*. I think learning it would make anyone a better programmer, as it gets you closest to how the machine really "thinks". You can then apply that knowledge even when you're working in higher-level languages. — William McBrine, Aug 17 '14 at 07:58
This was not day to day, more for just "fun". But thanks for the answers, since they do clarify what is going on. PS - the benchmarks were just there for initial playing. The plan was to get integer addition working, and then have a look at some of the more complex problems. — Sbspider, Oct 13 '14 at 08:08

score 3 · Accepted Answer · answered Aug 17 '14 at 03:16

Your assembly code is trashing the EBX register (as Jongware noted) and this likely why the second loop in your C++ code is only executed once. If i being stored in EBX then changing EBX to 2000 in Add will cause the next test of the loop condition i < 100 to fail.

You need either save and restore the EBX register in your assembly code or you need to pick another register that isn't assumed to be preserved across function calls (EAX, EDX, or ECX).

The `ebx` register isn't needed! The `add` can use a memory operand. — Gene, Aug 17 '14 at 04:01

Intel C++ optimizer removes masm code

1 Answers1