C++ inline assembly (Intel compiler): LEA and MOV behaving differently in Windows and Linux

Question

I am converting a huge Windows dll to work on both Windows and Linux. The dll has a lot of assembly (and SS2 instructions) for video manipulation.

The code now compiles fine on both Windows and Linux using Intel compiler included in Intel ComposerXE-2011 on Windows and Intel ComposerXE-2013 SP1 on Linux.

The execution, however, crashes in Linux when trying to call a function pointer. I traced the code in gdb and indeed the function pointer doesn't point to the required function (whereas in Windows in does). Almost everything else works fine.

This is the sequence of code:

...
mov    rdi, this
lea    rdx, [rdi].m_sSomeStruct
...
lea    rax, FUNCTION_NAME                # if replaced by 'mov', works in Linux but crashes in Windows
mov    [rdx].m_pfnFunction, rax
...
call   [rdx].m_pfnFunction               # crash in Linux

where:

1) 'this' has a struct member m_sSomeStruct.

2) m_sSomeStruct has a member m_pfnFunction, which is a pointer to a function.

3) FUNCTION_NAME is a free function in the same compilation unit.

4) All those pure assembly functions are declared as naked.

5) 64-bit environment.

What is confusing me the most is that if I replace the 'lea' instruction that is supposed to load the function's address into rax with a 'mov' instruction, it works fine on Linux but crashes on Windows. I traced the code in both Visual Studio and gdb and apparently in Windows 'lea' gives the correct function address, whereas in Linux 'mov' does.

I tried looking into the Intel assembly reference but didn't find much to help me there (unless I wasn't looking in the right place).

Any help is appreciated. Thanks!

Edit More details:

1) I tried using square brackets

lea    rax, [FUNCTION_NAME]

but that didn't change the behaviour in Windows nor in Linux.

2) I looked at the disassembler in gdb and Windows, seem to both give the same instructions that I actually wrote. What's even worse is that I tried putting both lea/mov one after the other, and when I look at them in disassembly in gdb, the address printed after the instruction after a # sign (which I'm assuming is the address that's going to be stored in the register) is actually the same, and is NOT the correct address of the function.

It looked like this in gdb disassembler

lea  0xOffset1(%rip), %rax   # 0xSomeAddress
mov  0xOffset2(%rip), %rax   # 0xSomeAddress

where both (SomeAddress) were identical and both offsets were off by the same amount of difference between lea and mov instructions, But somehow, the when I check the contents of the registers after each execution, mov seem to put in the correct value!!!!

3) The member variable m_pfnFunction is of type LOAD_FUNCTION which is defined as

typedef void (*LOAD_FUNCTION)(const void*, void*);

4) The function FUNCTION_NAME is declared in the .h (within a namespace) as

void FUNCTION_NAME(const void* , void*);

and implemented in .cpp as

__declspec(naked) void namespace_name::FUNCTION_NAME(const void* , void*)
{
...
}

5) I tried turning off optimizations by adding

#pragma optimize("", off)

but I still have the same issue

What if you use `lea rax,[FUNCTION_NAME]` ? Does that give the same result on both platforms? — Michael, Jun 06 '14 at 14:16
Have you compared the generated machine code? The optimizer might be doing something funny. — nobody, Jun 06 '14 at 14:56
That's my bet too, you're fighting (and losing) against the optimizer on Linux, especially since you use different versions of your compiler. As a (really ugly and frail) work-around, you can use `#if` to use `lea`/`mov` depending on what works on which platform. — Blindy, Jun 06 '14 at 16:58
Yes, Michael, I forgot to mention that. I tried using square brackets and that gave the same result in Windows (i.e. still worked) and the same result in Linux (i.e. still didn't work!). — user3715251, Jun 06 '14 at 17:44
@Andrew, I looked at the disassembler in gdb and Windows, seem to both give the same instructions that I actually wrote. What's even worse is that I tried putting both `lea`/`mov` one after the other, and when I look at them in disassembly in gdb, the address printed after the instruction after a # sign (which I'm assuming is the address that's going to be stored in the register) is actually the same, and is NOT the correct address of the function. But somehow, the when I check the contents of the registers after each execution, `mov`seem to put in the correct value!! — user3715251, Jun 06 '14 at 17:51
@Blindy, I am already (temporarily) using such a #define but I am really trying to avoid that. I'll try to disable optimizations in that compilation unit by `#pragma optimize("", off)` and see if that changes anything — user3715251, Jun 06 '14 at 17:55
If `mov` works and `lea` doesn't then that means that the `FUNCTION_NAME` symbol is a pointer to the function in Linux. Show us the declaration of `FUNCTION_NAME`. Is it an exported function? — Michael Burr, Jun 06 '14 at 18:09
@Michael It's not an exported function. I just added the declaration above to my post. But isn't `FUNCTION_NAME` always supposed to be a pointer to the function whether in Linux or Windows? — user3715251, Jun 06 '14 at 18:46
Since this problem appears to be delving into fiddly details maybe you should post a reduced, compilable example so that other can repro exactly what you're seeing. Linux might be using a GOT (Global Object Table) for function addresses (particularly exports or externals). I'm not sure about the implementation details. — Michael Burr, Jun 06 '14 at 20:30
And when you post a repro example, also let us know the compiler options being used to build for each platform. — Michael Burr, Jun 06 '14 at 20:36
Seeing that the issue is derived from Intel's flagship compiler and IDE - I'd be asking them this question (and perhaps looking through some release notes on updates to see if there were any bugs that may relate) — Litch, Jun 07 '14 at 14:40
@Michael I will work on a reduced compilable code that produces the same behaviour. — user3715251, Jun 09 '14 at 20:35
@Litch Good idea. I posted the issue on Intel Development Forums. — user3715251, Jun 09 '14 at 20:36

score 5 · Answer 1 · answered Jun 23 '14 at 08:08

Off hand, I suspect that the way linking to DLLs works in the latter case is that FUNCTION_NAME is a memory location that actually will be set to the loaded address of the function. That is, it's a reference (or pointer) to the function, not the entry point.

I'm familiar with Win (not the other), and I've seen how calling a function might either

(1) generate a CALL to that address, which is filled in at link time. Normal enough for functions in the same module, but if it's discovered at link time that it's in a different DLL, then the Import Library is a stub that the linker treats the same as any normal function, but is nothing more than JMP [????]. The table of addresses to imported functions is arranged to have bytes that code a JMP instruction just before the field that will hold the address. The table is populated at DLL Load time.

(2) If the compiler knows that the function will be in a different DLL, it can generate more efficient code: It codes an indirect CALL to the address located in the import table. The stub function shown in (1) has a symbol name associated with it, and the actual field containing the address has a symbol name too. They both are named for the function, but with different "decorations". In general, a program might contain fixup references to both.

So, I conjecture that the symbol name you used matches the stub function on one compiler, and (that it works in a similar way) matches the pointer on the other platform. Maybe the assembler assigns the unmangled name to one or the other depending on whether it is declared as imported, and the options are different on the two toolchains.

Hope that helps. I suppose you could look at run-time in a debugger and see if the above helps you interpret the address and the stuff around it.

Correct assumption. The question blindly assumes that a "pointer t function" is just a 32 bits number when the actual requirement is just that it's something dereferencable using whatever logic the particular compiler prefers. And that includes DLL jump tables. — MSalters, Jun 23 '14 at 15:48

score 0 · Answer 2 · edited May 23 '17 at 12:10

0

After reading the difference between mov and lea here What's the purpose of the LEA instruction? it looks to me like on Linux there is one additional level of indirection added into the function pointer. The mov instruction causes that extra level of indirection to be passed through, while on Windows without that extra indirection you would use lea.

Are you by any chance compiling with PIC on Linux? I could see that adding the extra indirection layer.

edited May 23 '17 at 12:10

Community

1
1

answered Jun 23 '14 at 15:23

Mark B

95,107
10
109
188

I think that extra indirection is not needed for amd-64, which has native IP-relative addressing. Calling a function with RIP addressing mode would compute the delta, just like short jumps always have. Pointers would not be affected, because it doesn't need to consolidate the real fixup location into one place (basically treats all functions as imported). – JDługosz Jun 24 '14 at 04:07

C++ inline assembly (Intel compiler): LEA and MOV behaving differently in Windows and Linux

2 Answers2