strange behavior when trying to compile a source with tcc against gcc generated .o file

Question

I am trying to compile a source with tcc (ver 0.9.26) against a gcc-generated .o file, but it has strange behavior. The gcc (ver 5.3.0)is from MinGW 64 bit.

More specifically, I have the following two files (te1.c te2.c). I did the following commands on windows7 box

c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O  elf64-x86-64 te1.o   #this is needed because te1.o from previous step is in COFF format, tcc only understand ELF format
c:\tcc> tcc te2.c te1.o
c:\tcc> te2.exe
567in dummy!!!

Note that it cut off 4 bytes from the string 1234567in dummy!!!\n. Wonder if what could have gone wrong.

Thanks Jin

========file te1.c===========

#include <stdio.h>

void dummy () {
    printf1("1234567in dummy!!!\n");
}

========file te2.c===========

#include <stdio.h>

void printf1(char *p) {
    printf("%s\n",p);
}
extern void dummy();
int main(int argc, char *argv[]) {
    dummy();
    return 0;
}

Update 1

Saw a difference in assembly between te1.o (te1.c compiled by tcc) and te1_gcc.o (te1.c compiled by gcc). In the tcc compiled, I saw lea -0x4(%rip),%rcx, on the gcc compiled, I saw lea 0x0(%rip),%rcx. Not sure why.

C:\temp>objdump -d te1.o

te1.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <dummy>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 81 ec 20 00 00 00    sub    $0x20,%rsp
   b:   48 8d 0d fc ff ff ff    lea    -0x4(%rip),%rcx        # e <dummy+0xe>
  12:   e8 fc ff ff ff          callq  13 <dummy+0x13>
  17:   c9                      leaveq
  18:   c3                      retq
  19:   00 00                   add    %al,(%rax)
  1b:   00 01                   add    %al,(%rcx)
  1d:   04 02                   add    $0x2,%al
  1f:   05 04 03 01 50          add    $0x50010304,%eax

C:\temp>objdump -d te1_gcc.o

te1_gcc.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <dummy>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 20             sub    $0x20,%rsp
   8:   48 8d 0d 00 00 00 00    lea    0x0(%rip),%rcx        # f <dummy+0xf>
   f:   e8 00 00 00 00          callq  14 <dummy+0x14>
  14:   90                      nop
  15:   48 83 c4 20             add    $0x20,%rsp
  19:   5d                      pop    %rbp
  1a:   c3                      retq
  1b:   90                      nop
  1c:   90                      nop
  1d:   90                      nop
  1e:   90                      nop
  1f:   90                      nop

Update2

Using a binary editor, I changed the machine code in te1.o (produced by gcc) and changed lea 0(%rip),%rcx to lea -0x4(%rip),%rcx and using the tcc to link it, the resulted exe works fine. More precisely, I did

c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O  elf64-x86-64 te1.o 
c:\tcc> use a binary editor to the change the bytes from (48 8d 0d 00 00 00 00) to (48 8d 0d fc ff ff ff)
c:\tcc> tcc te2.c te1.o
c:\tcc> te2
1234567in dummy!!!

Update 3

As requested, here is the output of objdump -r te1.o

C:\temp>gcc -c te1.c

C:\temp>objdump -r te1.o

te1.o:     file format pe-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
000000000000000b R_X86_64_PC32     .rdata
0000000000000010 R_X86_64_PC32     printf1


RELOCATION RECORDS FOR [.pdata]:
OFFSET           TYPE              VALUE
0000000000000000 rva32             .text
0000000000000004 rva32             .text
0000000000000008 rva32             .xdata



C:\temp>objdump -d te1.o

te1.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <dummy>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 20             sub    $0x20,%rsp
   8:   48 8d 0d 00 00 00 00    lea    0x0(%rip),%rcx        # f <dummy+0xf>
   f:   e8 00 00 00 00          callq  14 <dummy+0x14>
  14:   90                      nop
  15:   48 83 c4 20             add    $0x20,%rsp
  19:   5d                      pop    %rbp
  1a:   c3                      retq
  1b:   90                      nop
  1c:   90                      nop
  1d:   90                      nop
  1e:   90                      nop
  1f:   90                      nop

Chances are `tcc` and `gcc` have different default calling conventions. Might want to check that. — Captain Obvlious, Jul 12 '16 at 19:14
Should `te1.c` have `extern void printf1(char *p);` or include a header that declares `printf()`? — RastaJedi, Jul 12 '16 at 19:26
Side issue: `extern void dummy();` should be `extern void dummy(void);`, else `main()` could call `dummy(1,2,3)` with no warning/error. — chux - Reinstate Monica, Jul 12 '16 at 20:10
Calling convention is something I worried about, wish there is a definitive answer. — packetie, Jul 12 '16 at 20:17
Added the `extern void printf1(char *p);` but it didn't make any difference. Thanks for pointing out `extern void dummy(void);` Tried it, got no difference, however. — packetie, Jul 12 '16 at 20:20
What are the relocations for your te1.o (`readelf -r te1.o`)? Have you tried to compile te2.c with tcc but to link with gcc? There are three cases: 1) te1.o is wrong 2) the tcc-linker does something wrong, 3) something else. Would be nice to rule 1+2 out.. — ead, Jul 16 '16 at 09:22
I cannot reproduce your error on linux. The difference might be, that I don't need objcopy. — ead, Jul 16 '16 at 09:32
I think the tcc compiler and linker something different, for the function call `printf1(..)`, it translated it into `lea -0x4(%rip),%rcx`. Notice the `-4` there? That explains if tcc links the gcc-compiled te1.o, it's 4 byte off in the pointer to the constant string. Yes, I tested it on Linux and tcc and gcc worked well together too. My linux gcc is of different version (4.8.4) from the windows version (5.3.0). `objdump -d` shows these two versions of gcc generate different code. — packetie, Jul 16 '16 at 13:35
Could you please show us the relocations (`readelf -r te1.o`)? -4 can only have a meaning, when we know the relocation. On linux, gcc uses R_X86_64_32 relocation but tcc R_X86_64_PC32, maybe here got something mixed up. I would also try to link with another linker (not tcc). — ead, Jul 16 '16 at 14:55
your experiment rules the possibility" 3) something else" almost out. Now we have to find out whether the object file is corrupt or the linker goes wrong... — ead, Jul 16 '16 at 15:08
@ead, posted the output of `objdump -r te1.o`. Don't know if tcc is wrong but it's definitely doing something different, as is clear from the assembly code `lea -4(%rip), %rcx` it generates. — packetie, Jul 17 '16 at 03:10
Thanks, I don't know the pe-format at all. But for the elf-format the relocation should be `000000000000000b R_X86_64_PC32 .rdata-4` — ead, Jul 17 '16 at 09:59

score 7 · Answer 1 · edited Jul 17 '16 at 12:24

7

Has nothing to do with tcc or calling conventions. It has to do with different linker conventions for elf64-x86-64 and pe-x86-64 formats.

With PE, the linker will subtract 4 implicitly to calculate the final offset.

With ELF, it does not do this. Because of this, 0 is the correct initial value for PE, and -4 is correct for ELF.

Unfortunately, objcopy does not convert this -> bug in objcopy.

edited Jul 17 '16 at 12:24

x6iae

4,074
3
29
50

answered Jul 17 '16 at 09:37

h1n1

221
1
1

It would make a lot of sense! Could you please provide some links? I mostly interested in "It has to do with different linker conventions for elf64-x86-64 and pe-x86-64 formats." – ead Jul 17 '16 at 09:56
See also: https://sourceware.org/bugzilla/show_bug.cgi?id=970 - wontfix, see: https://sourceware.org/ml/binutils/1999-q3/msg00611.html – h1n1 Jul 17 '16 at 16:14
I noticed that using 32 bit compiler + objcopy the bug isn't there. Maybe the workaround is that, while they're hopefully fixing objcopy. – Jean-François Fabre Jul 17 '16 at 17:02
Build a DLL with GCC (or another compiler) and link to it with TCC. – h1n1 Jul 17 '16 at 17:53
It looks like it is bad idea to use objcopy converting 64bit objectfiles. I tried some conversions on my linux machine and a lot of them were garbage. 32bit could be another story because this is another code model. – ead Jul 17 '16 at 18:48
This project will need to process file with many GB in size, so it's better to be 64bit application. I read this `https://sourceware.org/ml/binutils/1999-q3/msg00611.html` and it appears it's not easy to fix the issue with objcopy. I can't build a dll since the sources will call functions from each other (A final linking is needed for execution). The reason I had to use objcopy (hence the issues) is, I can't make MinGW gcc to output object file in elf format. Wonder if you guys knows a workaround to do it? Thanks. – packetie Jul 17 '16 at 19:55
@codingFun gcc can elf-format (after all it is what it does on Linux). You could build gcc from the source with cross-compiler flag `--target=elf-something` (don't remember exactly), than it would produce elf-object files – ead Jul 18 '16 at 07:30
Thanks @ead. That's good to know. Wish there is a binary I can just download. Compiling on windows is always a big hassle. – packetie Jul 20 '16 at 13:33

Jean-François Fabre · Answer 2 · 2016-07-13T21:02:10.587

add

extern void printf1(char *p);

to your te1.c file

Or: the compiler will assume argument 32 bit integer since there's no prototype, and pointers are 64-bit long.

Edit: this is still not working. I found out that the function never returns (since calling the printf1 a second time does nothing!). Seems that the 4 first bytes are consumed as return address or something like that. In gcc 32-bit mode it works fine. Sounds like a calling convention problem to me but still cannot figure it out. Another clue: calling printf from te1.c side (gcc, using tcc stdlib bindings) crashes with segv.

I disassembled the executable. First part is repeated call from tcc side

  40104f:       48 8d 05 b3 0f 00 00    lea    0xfb3(%rip),%rax        # 0x402009
  401056:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  40105a:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  40105e:       e8 9d ff ff ff          callq  0x401000
  401063:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401067:       e8 94 ff ff ff          callq  0x401000
  40106c:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401070:       e8 8b ff ff ff          callq  0x401000
  401075:       48 8b 4d f8             mov    -0x8(%rbp),%rcx
  401079:       e8 82 ff ff ff          callq  0x401000
  40107e:       e8 0d 00 00 00          callq  0x401090
  401083:       b8 00 00 00 00          mov    $0x0,%eax
  401088:       e9 00 00 00 00          jmpq   0x40108d
  40108d:       c9                      leaveq
  40108e:       c3                      retq

Second part is repeated (6 times) call to the same function. As you can see the address is different (shifted by 4 bytes, like your data) !!! It kind of works just once because the 4 first instructions are the following:

 401000:       55                      push   %rbp
 401001:       48 89 e5                mov    %rsp,%rbp

so stack is destroyed if those are skipped!!

  40109f:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  4010a3:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010a7:       48 89 c1                mov    %rax,%rcx
  4010aa:       e8 55 ff ff ff          callq  0x401004
  4010af:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010b3:       48 89 c1                mov    %rax,%rcx
  4010b6:       e8 49 ff ff ff          callq  0x401004
  4010bb:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010bf:       48 89 c1                mov    %rax,%rcx
  4010c2:       e8 3d ff ff ff          callq  0x401004
  4010c7:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010cb:       48 89 c1                mov    %rax,%rcx
  4010ce:       e8 31 ff ff ff          callq  0x401004
  4010d3:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010d7:       48 89 c1                mov    %rax,%rcx
  4010da:       e8 25 ff ff ff          callq  0x401004
  4010df:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4010e3:       48 89 c1                mov    %rax,%rcx
  4010e6:       e8 19 ff ff ff          callq  0x401004
  4010eb:       90                      nop

%c3%a7ois-fabre Thanks for the suggestion, I tried it, but got the exactly same result as before. — packetie, Jul 12 '16 at 20:14
That's great to know! Please let me know what you find out. I am going to add a bounty for this :-) — packetie, Jul 13 '16 at 20:42
In my case, I saw the following from te2.exe `lea 0x2fa5(%rip),%rcx # 0x404004` The offset should have been 0x404000, where the string starts. the extra 4 could be related to `lea -0x4(%rip),%rcx` (Update 1 from the OP). Seems to be a bug on tcc, but I am not sure. — packetie, Jul 15 '16 at 03:16

strange behavior when trying to compile a source with tcc against gcc generated .o file

2 Answers2