EDIT: Question: I wonder where (if anywhere) it says that the reference to extern const w in ret_global() can be optimized to an intermediate while the call to ret_42() in ret_fn_result cannot.
TLDR; Logic behind this behavior (at least for GCC)
Compiler constant folding optimization capable of inlining complex const variables and structures
Compiler default behavior for functions is to export. If -fvisibility=hidden
flag is not used, all functions are exported. Because any defined function is exported, it cannot be inlined. So call to ret_42
in ret_fn_result
cannot be inlined. Turn on -fvisibility=hidden
, the result will be as below.
Let's say that, if it would be possible to export and inline function for optimization purposes at the same time, it would lead to linker
creating code that sometimes work in one way (inlined), some times works overriden (interposition), some times works straight in the scope of single loading and execution of resulting executable.
There are other flags that are in effect for this subject. Most notables:
-Bsymbolic
, -Bsymbolic-functions
and --dynamic-list
as per SO.
-fno-semantic-interposition
of course optimization flags
Function ret_fn_result
when ret_42
is hidden, not exported then inlined.
0000000000001110 <ret_fn_result>:
1110: b8 2b 00 00 00 mov $0x2b,%eax
1115: c3 retq
Technicals
STEP #1, subject is defined in lib.c
:
SCOPE const struct wrap_ { const int x; } ptr = { 42 };
SCOPE struct wrap { const struct wrap_ *ptr; } const w = { &ptr };
int ret_global(void) { return w.ptr->x; }
When lib.c
is compiled, w.ptr->x
is optimized to const
. So, with constant folding, it results in:
$ object -T lib.so
lib.so: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
0000000000001110 g DF .text 0000000000000006 Base ret_42
0000000000002000 g DO .rodata 0000000000000004 Base ptr
0000000000001120 g DF .text 0000000000000006 Base ret_global
0000000000001130 g DF .text 0000000000000011 Base ret_fn_result
0000000000003e18 g DO .data.rel.ro 0000000000000008 Base w
Where ptr
and w
is put to rodata
and data.rel.ro
(because const
pointer) respectively. Constant folding results in following code:
0000000000001120 <ret_global>:
1120: b8 2a 00 00 00 mov $0x2a,%eax
1125: c3 retq
Another part is:
int ret_42(void) { return 42; }
int ret_fn_result(void) { return ret_42()+1; }
Here ret_42
is a function, since not hidden, it is exported function. So it is a code
. And both are resulting in:
0000000000001110 <ret_42>:
1110: b8 2a 00 00 00 mov $0x2a,%eax
1115: c3 retq
0000000000001130 <ret_fn_result>:
1130: 48 83 ec 08 sub $0x8,%rsp
1134: e8 f7 fe ff ff callq 1030 <ret_42@plt>
1139: 48 83 c4 08 add $0x8,%rsp
113d: 83 c0 01 add $0x1,%eax
1140: c3 retq
Considering, that compiler does know only lib.c
, we are done. Put lib.so
aside.
STEP #2, compile lib_override.c
:
int ret_42(void) { return 50; }
#define SCOPE
SCOPE const struct wrap_ { const int x; } ptr = { 60 };
SCOPE struct wrap { const struct wrap_ *ptr; } const w = { &ptr };
Which is simple:
$ objdump -T lib_override.so
lib_override.so: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
00000000000010f0 g DF .text 0000000000000006 Base ret_42
0000000000002000 g DO .rodata 0000000000000004 Base ptr
0000000000003e58 g DO .data.rel.ro 0000000000000008 Base w
Exported function ret_42
, and then ptr
and w
is put to rodata
and data.rel.ro
(because const
pointer) respectively. Constant folding results in following code:
00000000000010f0 <ret_42>:
10f0: b8 32 00 00 00 mov $0x32,%eax
10f5: c3 retq
STEP 3, compile main.c
, let's see object first:
$ objdump -t main.o
# SKIPPED
0000000000000000 *UND* 0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000000000 *UND* 0000000000000000 ret_42
0000000000000000 *UND* 0000000000000000 printf
0000000000000000 *UND* 0000000000000000 ret_fn_result
0000000000000000 *UND* 0000000000000000 ret_global
0000000000000000 *UND* 0000000000000000 w
We have all symbols undefined. So they have to come from somewhere.
Then we link by default with lib.so
and code is (printf and others are omitted):
0000000000001070 <main>:
1074: e8 c7 ff ff ff callq 1040 <ret_42@plt>
1089: e8 c2 ff ff ff callq 1050 <ret_fn_result@plt>
109e: e8 bd ff ff ff callq 1060 <ret_global@plt>
10b3: 48 8b 05 2e 2f 00 00 mov 0x2f2e(%rip),%rax # 3fe8 <w>
Now we have lib.so
, lib_override.so
and a.out
in hands.
Let's simply call a.out
:
- main => ret_42 => lib.so => ret_42 => return 42
- main => ret_fn_result => lib.so => ret_fn_result => return ( lib.so => ret_42 => return 42 ) + 1
- main => ret_global => lib.so => ret_global => return rodata 42
- main => lib.so => w.ptr->x = rodata 42
Now let's preload with lib_override.so
:
- main => ret_42 => lib_override.so => ret_42 => return 50
- main => ret_fn_result => lib.so => ret_fn_result => return ( lib_override.so => ret_42 => return 50 ) + 1
- main => ret_global => lib.so => ret_global => return rodata 42
- main => lib_override.so => w.ptr->x = rodata 60
For 1: main
calls ret_42
from lib_override.so
because it is preloaded, ret_42
now resolves to one in lib_override.so
.
For 2: main
calls ret_fn_result
from lib.so
which calls ret_42
but from lib_override.so
, because it now resolves to one in lib_override.so
.
For 3: main
calls ret_global
from lib.so
which returns folded constant 42.
For 4: main
reads extern pointer which is pointing to lib_override.so
, because it is preloaded.
Finally, once lib.so
is generated with folded constants which are inlined, one can't demand them to be "overrideable". If intention to have overrideable data structure, one should define it in some other way (provide functions to manipulate them, don't use constants etc.). Because when defining something as constant, intention is clear, and compiler does what it does. Then even if that same symbol is defined as not constant in main.c
or other place, it cannot be unfolded
back in lib.c
.
#!/bin/sh -eu
: ${CC:=gcc}
cat > lib.c <<EOF
int ret_42(void) { return 42; }
#define SCOPE
SCOPE const struct wrap_ { const int x; } ptr = { 42 };
SCOPE struct wrap { const struct wrap_ *ptr; } const w = { &ptr };
int ret_global(void) { return w.ptr->x; }
int ret_fn_result(void) { return ret_42()+1; }
EOF
cat > lib_override.c <<EOF
int ret_42(void) { return 50; }
#define SCOPE
SCOPE const struct wrap_ { const int x; } ptr = { 60 };
SCOPE struct wrap { const struct wrap_ *ptr; } const w = { &ptr };
EOF
cat > main.c <<EOF
#include <stdio.h>
int ret_42(void), ret_global(void), ret_fn_result(void);
struct wrap_ { const int x; };
extern struct wrap { const struct wrap_ *ptr; } const w;
int main(void)
{
printf("ret_42()=%d\n", ret_42());
printf("ret_fn_result()=%d\n", ret_fn_result());
printf("ret_global()=%d\n", ret_global());
printf("w.ptr->x=%d\n",w.ptr->x);
}
EOF
for c in *.c; do gcc -fpic -O2 $c -c; done
$CC lib.o -o lib.so -shared
$CC lib_override.o -o lib_override.so -shared
$CC main.o $PWD/lib.so
export LD_LIBRARY_PATH=$PWD
./a.out
LD_PRELOAD=$PWD/lib_override.so ./a.out