x86 - setting a bit using inline assembly

Question

I'm writing a function to set the nth bit in a number x using bts instruction through inline assembly. Here is my function:

uint32_t set_bit_assembly(uint32_t x, uint32_t n)
{
    asm( "movl %1, %%eax; bts %0, %%eax;"
         :"=&r"(x)
         :"r"(n)
        );
    return x;
}

I want variables 'n' and 'x' to be the 1st operand for movl and bts respectively. However when I compile, it takes 'x' for movl and totally disregards 'n'. (I tried interchanging %0 and %1, which didn't help). Could you please tell me where I went wrong? Below is the generated assembly code:

00000043 <set_bit_assembly>:
  43:   55                      push   %ebp
  44:   89 e5                   mov    %esp,%ebp
  46:   83 ec 10                sub    $0x10,%esp
  49:   8b 55 0c                mov    0xc(%ebp),%edx
  4c:   89 d0                   mov    %edx,%eax
  4e:   0f ab c0                bts    %eax,%eax
  51:   89 45 fc                mov    %eax,-0x4(%ebp)
  54:   8b 45 fc                mov    -0x4(%ebp),%eax
  57:   c9                      leave  
  58:   c3                      ret

You know about the `|` operator though, do you? (I accept if you *want* to use inline assembly, just pointing out you don't *have* to.) — DevSolar, Nov 24 '15 at 11:18
Yes of course, I am aware of the bitwise operators. I just want to try some bit manipulations in assembly. — balajimc55, Nov 24 '15 at 18:21
Just wanted to make sure. You won't believe what some people get their mind set upon. ;-) — DevSolar, Nov 24 '15 at 19:03

score 4 · Accepted Answer · edited May 23 '17 at 11:59

4

How to use the bts instruction in asm

In your code, this line:

bts %0, %%eax;

Should be replaced with:

bts %%eax, %0;

Explanation

given the general form asm( "code" : outputs : inputs : clobbers) GCC replaced %0, %1 and %2 in the “code” with registers holding the arguments after the colon. The definition of BTS says that the first operand is the bit string and the second the bit index. So the solution seems to be: bts %0, %1 has you have done in your code. However this is not how bts works: bts wants the address as the second operand and the bit to set as the first so: bts %1, %0. See the correct usage here.

Better solutions

While your code will work with the suggested correction, there are better options like the following:

uint32_t set_bit_assembly2(uint32_t x, uint32_t n)
{
    asm( "bts %1,%0"
         :"+r"(x)
         :"r"(n)
        );
    return x;
}

As pointed out by @DavidWohlferd in the comment we should use "+r" as x will be both read and write by the bts instruction.

Moreover it is possible to increase the readability by using the symbolic names:

asm( "bts %[bit],%[value]"
     : [value] "+rm"(value) 
     : [bit] "r"(bit)
     :"cc");

Yet another possibility is (see this post):

uint32_t set_bit_assembly3(uint32_t x, uint32_t n)
{
    asm( "bts %1,%0": "+rm"(x) : "r"(n));
    return x;
}

Further readings:

This page that may be of great interest for people that wants to use bts: http://lxr.free-electrons.com/source/arch/x86/include/asm/bitops.h#L41

In this post Peter Cordes explains why bts on a memory operand is terrible for performance.

edited May 23 '17 at 11:59

Community

1
1

answered Nov 24 '15 at 11:15

terence hill

3,354
18
31

Can you explain the order in which the the variables are taken as operands for the assembly instruction? – balajimc55 Nov 24 '15 at 18:18
The BTS operation select the bit in the first operand (in our case x) at the position specified by the second operator (n). You can also say that "n" is the offset of the bit you want to set in the bit string "x". So it is not necessary to move the operand to a register, you can use directly bts. – terence hill Nov 24 '15 at 23:05
1

Looking at your first example, shouldn't that be `"+&r"`? Or even `"+&rm"`? And since this affects the flags, you should add the "cc" clobber. And speaking for myself, I find using symbolic names clearer. So something more like `asm( "bts %[bit],%[value]": [value] "+&rm"(value) : [bit] "r"(argc):"cc");`. – David Wohlferd Nov 24 '15 at 23:23
@DavidWohlferd and Terence: BTS with a memory operands is slow, so you'll actually get better code if you force gcc to have `x` in a register. (BTS with a memory operand can't figure out what memory address it's going to modify until after looking at the src operand, because of the crazy CISC semantics of the bit-string instructions. It decodes to >10 uops on Intel, vs. just one with a register operand.) See http://agner.org/optimize/ for insn tables. So avoid `BT*` with a mem operand unless optimizing for code size. – Peter Cordes Nov 25 '15 at 03:59
@PeterCordes Are you saying that if the value is currently in a memory location and/or needs to be written back to memory immediately after the BTS, it is still better to use a register? If not, allowing gcc the flexibility to choose between them still seems like the best strategy. It nearly always picks the "r" anyway. – David Wohlferd Nov 25 '15 at 04:49
@DavidWohlferd: yes, that's what I'm saying. The BT* instructions have crazy CISC bit-string semantics, and are implemented with lots of microcode on Intel CPUs. I just posted that as [an answer](http://stackoverflow.com/a/33908467/224132) on http://stackoverflow.com/questions/1983303/using-bts-assembly-instruction-with-gcc-compiler – Peter Cordes Nov 25 '15 at 06:36
@terence-hill could you tell me about the order in which the C variables are taken as operands to the instructions in an inline assembly code? – balajimc55 Nov 25 '15 at 09:00
1

@balajimc55 given the general form asm( "code" : outputs : inputs : clobbers) GCC replaced %0, %1 and %2 in the “code” with registers holding the arguments after the colon. The definition of BTS says that the first operand is the bit string and the second the bit index. So the solution seems to be: bts %0, %1 has you have done in your code. However this is not how bts works: bts wants the address as the second operand and the bit to set as the first so: bts %1, %0. See this (http://lxr.free-electrons.com/source/arch/x86/include/asm/bitops.h). – terence hill Nov 25 '15 at 14:46
@DavidWohlferd Thanks for pointing out the "+&r" and the possibility of use symbolic names. I update my answer to include your comment. – terence hill Nov 25 '15 at 22:04
@PeterCordes I tried to improve the readability of the answer and added a reference to your post, I'll appreciate any comments, thanks. – terence hill Dec 20 '15 at 17:35
The extra bold headers divide things up better. I like to use horizontal lines (`---`) to divide things when a headline isn't good. Suggestion for your very last version though: use a `+rm` constraint. You don't want to *force* the compiler to store it to memory and then operate on it there. Esp. since it's going to be the return value of the function, so you probably need it in `eax` anyway, unless the function is inlined into a call site where this is used on memory. Also mention *why* you're linking to my post: that `bts` on a memory operand is *terrible* for performance. – Peter Cordes Dec 20 '15 at 20:16
1

Oh, also, `bts` can take an immediate operand, so the bit-number constraint should be `ri`. Otherwise gcc would have to separately `mov` the count into a register when inlining into a caller that used a compile-time constant count. And use the right constraint in the first place, rather than leaving `=&r` and then in the next paragraph saying that's wrong. – Peter Cordes Dec 20 '15 at 20:18
@PeterCordes thanks. I included some of your suggestions. I will add more of it ASAP. – terence hill Dec 21 '15 at 14:44
1

I think there are still a couple small issues with explanation of the inline assembly. The first is that the '&' is not actually required. The '+' by itself is sufficient to mark the register as 'read/write'. The '&' signifies "early clobber", which can only occur when there are multiple assembly instructions. It's safe here, but doesn't change the compiler's behavior. See https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html for the official explanation. – Nathan Kurz Jan 19 '16 at 21:12
1

The second problem is bigger. Unless you can guarantee that n <= 31, you do not want both 'r' and 'm' as alternative constraints. The behavior of the BT functions with a bit offset greater than the register width is completely different when used with a memory operand as the destination. The register version does a modulo, but the memory version does not. See Table 3-2 here: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf. When Peter says "crazy CISC semantics", he really means it! – Nathan Kurz Jan 19 '16 at 21:26
@NathanKurz Removed the &, I guess that I left it just because was in the OP. – terence hill Feb 06 '16 at 20:51
@NathanKurz About the second problem, here the input should really be such that n <= 31, otherwise is an error. The input of the function is explicitly uint32. – terence hill Feb 06 '16 at 20:53
The source operand could still use a `ri` constraint, to allow the immediate source encoding (which is *much* more efficient if the destination is a memory operand). For performance reasons, you should probably just force the output operand to be a register, esp. if the source isn't usually an immediate. – Peter Cordes Feb 07 '16 at 13:20

x86 - setting a bit using inline assembly

1 Answers1

How to use the bts instruction in asm

Explanation

Better solutions

Further readings: