I'm trying to generate the smallest C program possible to see how many instructions are executed by running it. I disabled use of libraries and disabled vdso. Yet, my C program, which gdb says is 7 assembly instructions, ends up executing 17k instructions according to perf stat.
Is this a normal amount of instructions just to set up the program? According to gdb, code from ld-linux-x86-64.so.2 is mapped into the program address space. Given that I disabled vdso and am including no libraries, is this file necessary to run the program? Could this be the reason for the 17k instructions?
My C program foo5.c
int main(){
char* str = "Hello World";
return 0;
}
How I compile:
gcc -nostdlib -nodefaultlibs stubstart.S -o foo5 foo5.c
stubstart.S
.globl _start
_start:call main;
movl $1, %eax;
xorl %ebx, %ebx;
int $0x80
perf stat output:
Performance counter stats for './foo5':
0.60 msec task-clock:u # 0.015 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
11 page-faults:u # 0.018 M/sec
46,646 cycles:u # 0.077 GHz
17,224 instructions:u # 0.37 insn per cycle
5,145 branches:u # 8.513 M/sec
435 branch-misses:u # 8.45% of all branches
gdb program layout:
`/home/foo5', file type elf64-x86-64.
Entry point: 0x5555555542b1
0x0000555555554238 - 0x0000555555554254 is .interp
0x0000555555554254 - 0x0000555555554278 is .note.gnu.build-id
0x0000555555554278 - 0x0000555555554294 is .gnu.hash
0x0000555555554298 - 0x00005555555542b0 is .dynsym
0x00005555555542b0 - 0x00005555555542b1 is .dynstr
0x00005555555542b1 - 0x00005555555542d5 is .text
0x00005555555542d5 - 0x00005555555542e1 is .rodata
0x00005555555542e4 - 0x00005555555542f8 is .eh_frame_hdr
0x00005555555542f8 - 0x0000555555554330 is .eh_frame
0x0000555555754f20 - 0x0000555555755000 is .dynamic
0x00007ffff7dd51c8 - 0x00007ffff7dd51ec is .note.gnu.build-id in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd51f0 - 0x00007ffff7dd52c4 is .hash in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd52c8 - 0x00007ffff7dd53c0 is .gnu.hash in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd53c0 - 0x00007ffff7dd56f0 is .dynsym in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd56f0 - 0x00007ffff7dd5914 is .dynstr in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd5914 - 0x00007ffff7dd5958 is .gnu.version in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd5958 - 0x00007ffff7dd59fc is .gnu.version_d in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd5a00 - 0x00007ffff7dd5dd8 is .rela.dyn in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd5dd8 - 0x00007ffff7dd5e80 is .rela.plt in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd5e80 - 0x00007ffff7dd5f00 is .plt in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd5f00 - 0x00007ffff7dd5f08 is .plt.got in /lib64/ld-linux-x86-64.so.2
0x00007ffff7dd5f10 - 0x00007ffff7df4b20 is .text in /lib64/ld-linux-x86-64.so.2
0x00007ffff7df4b20 - 0x00007ffff7df9140 is .rodata in /lib64/ld-linux-x86-64.so.2
0x00007ffff7df9140 - 0x00007ffff7df9141 is .stapsdt.base in /lib64/ld-linux-x86-64.so.2
0x00007ffff7df9144 - 0x00007ffff7df97b0 is .eh_frame_hdr in /lib64/ld-linux-x86-64.so.2
0x00007ffff7df97b0 - 0x00007ffff7dfbc24 is .eh_frame in /lib64/ld-linux-x86-64.so.2
0x00007ffff7ffc680 - 0x00007ffff7ffce64 is .data.rel.ro in /lib64/ld-linux-x86-64.so.2
0x00007ffff7ffce68 - 0x00007ffff7ffcfd8 is .dynamic in /lib64/ld-linux-x86-64.so.2
0x00007ffff7ffcfd8 - 0x00007ffff7ffcfe8 is .got in /lib64/ld-linux-x86-64.so.2
0x00007ffff7ffd000 - 0x00007ffff7ffd050 is .got.plt in /lib64/ld-linux-x86-64.so.2
0x00007ffff7ffd060 - 0x00007ffff7ffdfd8 is .data in /lib64/ld-linux-x86-64.so.2
0x00007ffff7ffdfe0 - 0x00007ffff7ffe170 is .bss in /lib64/ld-linux-x86-64.so.2
UPDATE:
In the end, jester's comment about creating a standard executable instead of a PIE to remove the ld.so by adding the -no-pie flag to gcc reduced the perf instruction stat to 12. Then old_timer's -O2 suggestion further reduced it to 7! Thank you everyone.
UPDATE 2: The selected answer of using -static also reduces the instruction count from 17k to 12. Excellent answer.
Also this article linked by commenters is relevant and entertaining.