1

I'm woefully bad at understanding the GNU inline assembly syntax, so I'm hoping a practical example may help. Given the following assembly (x86-64, output by Clang) how would I construct a function using inline assembly that would be identical? GCC produces different code for the same function and I would like to get it to produce an identical version to what Clang (-O3) outputs.

bittest(unsigned char, int):
    btl %esi, %edi
    setb    %al
    ret

Here is what GCC (-O3) is producing:

bittest(unsigned char, int):
    movzx    eax, dil
    mov    ecx, esi
    sar    eax, cl
    and    eax, 1
    ret

Here is the C code for the function:

bool bittest(unsigned char byte, int index)
{
    return (byte >> index) & 1;
}
Chris_F
  • 4,991
  • 5
  • 33
  • 63
  • 1
    I'm with Patrick, you are making a bad call trying to match clang exactly. You could simply do asm("btl %esi, %edi\nsetb %al\nret"), but you will almost certainly regret it. If you decide you are willing to explore reasonable alternatives, I've got some. Also, if you are going to be doing work with inline asm, have you checked out the new docs? They've recently been re-written. They are better organized, and contain many examples: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html – David Wohlferd Jun 03 '14 at 07:08
  • So what does GCC generate? Are you sure it's not at least as good? – Mats Petersson Jun 03 '14 at 07:42
  • You're asking a question about a programming detail without giving hints to your larger goal. Are you optimizing for performance? For bitwise correctness? Just to learn? Or for a totally different reason? This could significantly affect answers given - for now I have no clue what your goal is and what would be an acceptable answer. – Klaas van Gend Jun 03 '14 at 09:10
  • @KlaasvanGend I don't actually have access to a local copy of Clang atm, only MinGW. I wanted to compare the performance. – Chris_F Jun 03 '14 at 09:54
  • Related to http://stackoverflow.com/questions/2039861/how-to-get-gcc-to-generate-bts-instruction-for-x86-64-from-standard-c – FrankH. Jun 03 '14 at 15:48

3 Answers3

2

Well, last time I wrote a 32bit bittest, it looked something like this (the 64bit looks slightly different):

unsigned char _bittest(const long *Base, long Offset) 
{ 
   unsigned char old; 
   __asm__ ("btl %[Offset],%[Base] ; setc %[old]" : 
      [old] "=rm" (old) : 
      [Offset] "Ir" (Offset), [Base] "rm" (*Base) : 
      "cc"); 

   return old; 
}

Although if you want to put it in a public header, I have a different version. When I use -O2, it ends up inlining the whole thing to make really efficient code.

I'm surprised gcc doesn't generate the btl here itself (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473), but you are right it doesn't.

David Wohlferd
  • 7,110
  • 2
  • 29
  • 56
  • It would be an optimization to remove the `"m"` alternative from `Base`. x86 memory bitstring instructions on modern CPUs are much slower (like ~10 uops) than some address math + a load + `bt r,r`. – Peter Cordes Aug 06 '16 at 07:25
  • 1
    I suppose it depends on how expensive freeing up a register is. If gcc has a scratch register laying around, you could be right. If it has to flush/reload something, maybe not. IAC, I don't think I've ever seen gcc use the "m" when "rm" is specified. But if I were going to update this old answer, I'd be more tempted to add the @cc stuff instead of using `setc`. – David Wohlferd Aug 06 '16 at 09:48
1

I think it's unlikely that you can nail down a byte-by-byte equivalent version in your compiler, there are minor differences that aren't worth worrying about. Following this question, make sure you're compiling with the correct flags. Trying to get two compilers to produce identical output is probably an exercise in futility.

Community
  • 1
  • 1
Patrick Collins
  • 10,306
  • 5
  • 30
  • 69
-1

If you want to generate the exact same code then you can do the following

const char bittestfunction[] = { 0xf, 0xa3, 0xf7, 0xf, 0x92, 0xc0, 0x3 };
int (*bittest)( unsigned char, int ) = (int(*)(unsigned char, int))bittestfunction;

You can call this in the same way bittest( foo, bar ).

From objdump on the (gcc) compiled executable

00000000004006cc <bittestfunction>:
  4006cc:       0f a3 f7                bt     %esi,%edi
  4006cf:       0f 92 c0                setb   %al
  4006d2:       c3                      retq
ceilingcat
  • 671
  • 7
  • 11
  • That's not useful because it doesn't inline. What this question really wants is an inline-asm version that can inline. Even worse, this has to be called through a function pointer (with an indirect `call`) because you wrote it as data, instead of just writing a stand-alone function in asm that can be linked normally. – Peter Cordes Aug 06 '16 at 07:22