10

I'm developing a shared library which can be executed independently to print it's own version number.

I've defined a custom entry point as:

const char my_interp[] __attribute__((section(".interp"))) = "/lib64/ld-linux-x86-64.so.2";

void my_main() {
   printf("VERSION: %d\n", 0);
   _exit(0);
}

and I compile with

gcc -o list.os -c -g -Wall -fPIC list.c
gcc -o liblist.so -g -Wl,-e,my_main -shared list.os -lc

This code compiles and runs perfectly.

My issue is when I change the parameter of the printf to be a float or double (%f or %lf). The library will then compile but segfault when run.

Anyone have any ideas?

edit1:

Here is the code that segfaults:

const char my_interp[] __attribute__((section(".interp"))) = "/lib64/ld-linux-x86-64.so.2"; 

void my_main() { 
    printf("VERSION: %f\n", 0.1f); 
    _exit(0); 
} 

edit2:

Additional environmental details:

uname -a

Linux mjolnir.site 3.1.10-1.16-desktop #1 SMP PREEMPT Wed Jun 27 05:21:40 UTC 2012 (d016078) x86_64 x86_64 x86_64 GNU/Linux

gcc --version

gcc (SUSE Linux) 4.6.2

/lib64/libc.so.6

Configured for x86_64-suse-linux. Compiled by GNU CC version 4.6.2. Compiled on a Linux 3.1.0 system on 2012-03-30.

edit 3:

Output in /var/log/messages upon segfault:

Aug 11 08:27:45 mjolnir kernel: [10560.068741] liblist.so[11222] general protection ip:7fc2b3cb2314 sp:7fff4f5c7de8 error:0 in libc-2.14.1.so[7fc2b3c63000+187000]

kobrien
  • 2,931
  • 2
  • 24
  • 33
  • 1
    This works perfectly on my 32 bits machine. And it should work on 64 bits too. did you include and ? – TOC Aug 11 '12 at 06:40
  • 1
    @TOC I did. Did you try print a floating point number as the version number? The code I posted works fine, except when printing a float. – kobrien Aug 11 '12 at 06:43
  • Can you show the code that prints the float? – Jonathan Leffler Aug 11 '12 at 06:43
  • @kobrien : Yes the code works fine on Linux (32 bits) for float and double – TOC Aug 11 '12 at 06:45
  • I get the same problem as you. And if I use -m32 with the obvious modification on .interp, it works. I've attempted to add -lm in case it made a difference, but that wasn't the problem. I've also added the missing includes, they could make a difference as printf is variadic, but it wasn't the case. – AProgrammer Aug 11 '12 at 06:46
  • @JonathanLeffler Question updated as requested. – kobrien Aug 11 '12 at 06:47
  • probably this is not your problem, `%f` is for `double` and `%lf` is for `long double`. There is no format for `float` since these are converted to `double` anyhow for variadic functions like `printf`. – Jens Gustedt Aug 11 '12 at 06:49
  • @kobrien what about bt inside gdb? – TOC Aug 11 '12 at 06:51
  • As long as there isn't a prototype for `printf()` that says the second argument is a float, then the float should be promoted to double anyway. But I'd certainly be curious to know if it works 'better' (without crashing) if you use `0.1` instead of `0.1f`. – Jonathan Leffler Aug 11 '12 at 06:51
  • @JonathanLeffler, it doesn't (my tentative was with 0.0 before the OP gave his code). – AProgrammer Aug 11 '12 at 06:53
  • @JonathanLeffler Crashes on 0.1 and 0.1f. – kobrien Aug 11 '12 at 06:53
  • @TOC backtrace doesn't show anything up. I've all relevant debug symbols compiled in. Still get "Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7a9b314 in ?? ()" – kobrien Aug 11 '12 at 06:56
  • Is the problem that `stdout` isn't set up by the start-up code in this context? Normally in an executable, a file such as `crt0.o` is included which deals with setup work such as ensuring `stdout` is available? – Jonathan Leffler Aug 11 '12 at 06:58
  • I tried it, and other printfs work OK, for example with %s. Also, other floating-point operations, such as sin and cos, work OK. It's just printing floating-point numbers that doesn't work. Perhaps it is that some part of the C library isn't loaded properly? – Thomas Padron-McCarthy Aug 11 '12 at 07:02
  • It's reasonable to assume this is down to a missing initialization of stdout. I shall do some sleuthing. – kobrien Aug 11 '12 at 07:04
  • Another thing: When it crashes, this message gets printed to the system log: Aug 11 09:08:28 bimbatron kernel: [69291.763774] liblist.so[6839] general protection ip:7fcba4478064 sp:7fffaa66af78 error:0 in libc-2.13.so[7fcba4427000+18a000] – Thomas Padron-McCarthy Aug 11 '12 at 07:09
  • @ThomasPadron-McCarthy I can confirm that happens on my machine too. As of yet, people are suggesting recreating /dev/null as a solution, but that seems hackish and unsatisfactory. – kobrien Aug 11 '12 at 07:14
  • This appears to be the class of error occuring: http://en.wikipedia.org/wiki/General_protection_fault – kobrien Aug 11 '12 at 07:17
  • 1
    "Recreating /dev/null as a solution"? I don't understand that at all. – Thomas Padron-McCarthy Aug 11 '12 at 07:32

1 Answers1

5

Figured it out. :)

The floating point operations on x86_64 use the xmm vector registers. Access to these must be aligned on 16byte boundaries. This explains why 32bit platforms were unaffected and integer and character printing worked.

I've compiled my code to assembly with:

gcc -W list.c -o list.S -shared -Wl,-e,my_main -S -fPIC

then altered the "my_main" function to be have more stack space.

Before:

my_main:
 .LFB6:
 .cfi_startproc
 pushq   %rbp
 .cfi_def_cfa_offset 16
 .cfi_offset 6, -16
 movq    %rsp, %rbp
 .cfi_def_cfa_register 6
 movl    $.LC0, %eax
 movsd   .LC1(%rip), %xmm0
 movq    %rax, %rdi
 movl    $1, %eax
 call    printf
 movl    $0, %edi
 call    _exit
 .cfi_endproc

After:

my_main:
 .LFB6:
 .cfi_startproc
 pushq   %rbp
 .cfi_def_cfa_offset 16
 .cfi_offset 6, -16
 subq    $8, %rsp ;;;;;;;;;;;;;;; ADDED THIS LINE
 movq    %rsp, %rbp
 .cfi_def_cfa_register 6
 movl    $.LC0, %eax
 movsd   .LC1(%rip), %xmm0
 movq    %rax, %rdi
 movl    $1, %eax
 call    printf
 movl    $0, %edi
 call    _exit
 .cfi_endproc

Then I compiled this .S file by:

gcc list.S -o liblist.so -Wl,-e,my_main -shared

This fixes the issue, but I will forward this thread to the GCC and GLIBC mailing lists, as it looks like a bug.

edit1:

According to noshadow in gcc irc, this is a non standard way to do this. He said if one is to use gcc -e option, either initialize the C runtime manually, or don't use libc functions. Makes sense.

kobrien
  • 2,931
  • 2
  • 24
  • 33
  • ISTR that there is an option for gcc to align the stack on larger boundaries than the default one. You could be able to use it as a work around until the bug is fixed instead of modifying the assembly. – AProgrammer Aug 11 '12 at 11:08
  • Sounds good. Alternative way is to not use libc functions and use syscalls directly, but note this is more code and less portable. – kobrien Aug 11 '12 at 11:26
  • 1
    The instructions that load single floating-point values into XMM registers do not require 16-byte alignment. They only require four-byte alignment for single-precision and eight-byte alignment for double-precision. You may have been encountering other issues, such as violating the Application Binary Interface requires for calling subroutines (might have aligned the stack incorrectly, might have failed to set a bit that indicates floating-point parameters are passed, or other problems). – Eric Postpischil Aug 11 '12 at 18:21