High, could anyone help me understand why is it more efficient calling math library functions than writing inline assembly code to perform the same operation?. I wrote this simple test:
#include <stdio.h>
#define __USE_GNU
#include <math.h>
void main( void ){
float ang;
int i;
for( i = 0; i< 1000000; i++){
ang = M_PI_2 * i/2000000;
/*__asm__ ( "fld %0;"
"fptan;"
"fxch;"
"fstp %0;" : "=m" (ang) : "m" (ang)
) ;*/
ang = tanf(ang);
}
printf("Tan(ang): %f\n", ang);
}
That code computes the tangent of an angle in 2 different ways, one calling the tanf function from the dinamically linked library libm.a, and the second one using inline assembly code. Note that I comment parts of the code alternatively. The code performs the operation several times to get meaningful results with command time y linux terminal.
The version that uses math library takes around 0.040s. The version that uses assembly code takes around 0.440s; ten times more.
These are the results of disassembly. Both have been compiled with -O3 option.
LIBM
4005ad: b8 db 0f c9 3f mov $0x3fc90fdb,%eax
4005b2: 89 45 f8 mov %eax,-0x8(%rbp)
4005b5: f3 0f 10 45 f8 movss -0x8(%rbp),%xmm0
4005ba: e8 e1 fe ff ff callq 4004a0 <tanf@plt>
4005bf: f3 0f 11 45 f8 movss %xmm0,-0x8(%rbp)
4005c4: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4005c8: 83 7d fc 00 cmpl $0x0,-0x4(%rbp)
4005cc: 7e df jle 4005ad <main+0x19>
ASM
40050d: b8 db 0f c9 3f mov $0x3fc90fdb,%eax
400512: 89 45 f8 mov %eax,-0x8(%rbp)
400515: d9 45 f8 flds -0x8(%rbp)
400518: d9 f2 fptan
40051a: d9 c9 fxch %st(1)
40051c: d9 5d f8 fstps -0x8(%rbp)
40051f: 83 45 fc 01 addl $0x1,-0x4(%rbp)
400523: 83 7d fc 00 cmpl $0x0,-0x4(%rbp)
400527: 7e e4 jle 40050d <main+0x19>
Any idea? Thanks.
I think I got an idea. Browsing the glibc code I found out that tanf function is implemented through a polynomial aproximation and using the sse extension. I guess that's turn to be faster than microcode for the fptan instruction.