Gcc inline assembly: what's wrong with the dynamic allocated register `r` in input operand?

Question

When I test the GCC inline-assembly, I use the test function to display a character on the screen with the BOCHS emulator. This code is running in 32-bit protected mode. The code is as follows:

test() {
    char ch = 'B';
    __asm__ ("mov $0x10, %%ax\n\t" 
                "mov %%ax, %%es\n\t"
                "movl $0xb8000, %%ebx\n\t"
                "mov $0x04, %%ah\n\t" 
                "mov %0, %%al\n\t" 
                "mov %%ax, %%es: ((80 * 3 + 40) * 2)(%%ebx)\n\t" 
                ::"r"(ch):);
}

The result I'm getting is:

The red character on the screen isn't displaying B correctly. However, when I changed the input register r to c like this: ::"c"(ch):);, which is the last line of the above code, the character 'B' displays normally:

What's the difference? I accessed the video memory through the data segment directly after the computer entered into protected mode.

I have trace the assembly code, I have found that the code has been assembled to mov al, al when the r register is chosen and the value of ax is 0x0010, so al is 0x10. The result should be like this, but why did it choose the al register. Isn't it supposed to choose the register which hasn't been used before? When I add the clobbers list, I have solved the problem.

I don't know much about this, but [the manual](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html) seems friendly. — Lundin, May 16 '16 at 14:39
I have read the manual before, maybe not carefully, but I can't find the result now. — zhenguoli, May 16 '16 at 14:41
Protected mode. And the protected mode works well. And maybe I have found the result, but I don't know am I right? — zhenguoli, May 16 '16 at 15:46
One potential snafu you could possibly have is that you modify many registers in your assembler template but you don't list them as outputs or clobbers. — Michael Petch, May 16 '16 at 15:49
I also find that when I add the `clobbers` list, the display is normal. — zhenguoli, May 16 '16 at 15:51
What is happening is that besides the substitutions it has no knowledge of what the instructions actually do in your assembler template. It thinks it can use `al` because you haven't told it that it shouldn't use it. You may be under the false impression that _GCC_ analyzes the code to determine what it is allowed to use or what it isn't allowed. All that information is supplied via the input/output/clobbers etc. — Michael Petch, May 16 '16 at 15:51
Yes, I haven't told it that it shouldn't use it before, but now it works well. I will read the `gcc manual` more carefully. What you say is what I thought before. Thanks. — zhenguoli, May 16 '16 at 15:53
Assembler templates if not used properly can appear to work in some cases and then eventually break. Assembler templates can be the source of all kinds of problems. If in the hands of people who understand them they can be very useful. If you are new to inline assembler templates in _GCC_ you can be in for all kinds of hurt. You might be better off starting by coding some of the routines as assembly and add those objects to your code. — Michael Petch, May 16 '16 at 15:58
I don't believe GCC inline assembly templates allow _ES_ in the clobber list so if you really need to change _ES_ you'll have to add a `push %%es` at the top and a `pop %%es` at the end . Assuming the naive/simple (not best) solution - since you alter _EAX_ and _EBX_ in your assembler template you will have to list both in your clobber list. Any register listed in the clobber list can't be used by _GCC_ to use as an input or output register. That suggested change would allow `ch` to be an input operand with constraint `"rmi"` — Michael Petch, May 16 '16 at 16:24
The ideal (not naive/simple) is to list some dummy variables to be used as output operands so that the compiler can choose other registers besides _EAX_ and _EBX_ . This would mean you wouldn't need them listed as clobbers anymore and it would allow the compiler to better optimize register usage (if using -O1, -O2, -O3 etc) — Michael Petch, May 16 '16 at 16:28
I'm very curious about one thing. The whole reason I can tell that you are using assembler is to override the default segment. It has me curious, are you sure you need to? I can't tell the context of your code (it looks like it would be in the kernel itself). usually the easiest thing to do is set DS=ES=SS (even FS and GS if you want) to be all the same descriptor (0x10 in this case) when your kernel loads (a flat descript for all 4gb is easiest). Maybe there is a reason you can't do that, but if you can then you have no need to override _ES_ (or use it as part of the `mov` instruction) — Michael Petch, May 17 '16 at 01:32
I mention this because without the need to override the default segment on the `mov` instruction to use _ES_ you can code the entire thing in _C_ without the need for the inline assembly in this case. — Michael Petch, May 17 '16 at 01:33
Very good point. Thanks for your advice. I am new to gcc inline assembly, and the `es` register is set to `DS=ES=SS`, but I set it again in case of it may be changed elsewhere. And the `es` register can't be in the clobber list, I have encountered such a situation. And I have known some examples about how to write the entire thing in C. Thank you very much, forgiving my reply not in time@MichaelPetch — zhenguoli, May 17 '16 at 05:09
@zhenguoli: I wish you'd said that earlier, before I spent time writing a big answer >.<. Just don't change `es` anywhere. It's not something that happens without code on purpose changing it, and having it wrong will cause breakage everywhere, not just in this function. — Peter Cordes, May 17 '16 at 05:14
Whoa you are clobbering `es`. The compiler assumes that `es` is equal to `ds`, so if you want to change it, go for it, but put it back. — doug65536, May 17 '16 at 05:16
About your `mov al,al`, that is silly, why do you `mov` it? Just use inputs that set `a`, then the compiler can endeavour to already have the right value in `eax`, or otherwise make that happen as efficiently as possible. You should avoid `mov` in inline assembly, if at all possible. — doug65536, May 17 '16 at 05:17
Sorry, thanks for your answer. So I can't change the `es` in the inline assembly code. @PeterCordes — zhenguoli, May 17 '16 at 05:19
Good advice, avoiding `mov` in inline assembly, use inputs. @doug65536. Thanks. — zhenguoli, May 17 '16 at 05:26
@zhenguoli You said in a comment "I have encountered such a situation" . are you suggesting you have a situation where _ES_ seems to be changed to something else while your code is running? — Michael Petch, May 17 '16 at 18:46
@MichaelPetch, no, the situation is 'when I list `es` in the clobber list, the gcc will complain error, so the GCC inline assembly templates don't allow ES in the clobber list'. Thanks for your reminding. — zhenguoli, May 18 '16 at 11:10
I guess you thought you had to set ES because it was typical in 16-bit code to specify the _ES_ register when trying to access memory address 0xb8000? (as in code like `mov ax, 0b800h` `mov es, ax` and then mov to a memory address using _ES_ as an override. — Michael Petch, May 18 '16 at 15:01

score 4 · Accepted Answer · edited May 23 '17 at 11:44

Like @MichaelPetch commented, you can use 32bit addresses to access whatever memory you want from C. The asm gcc emits will assume a flat memory space, and assume that it can copy esp to edi and use rep stos to zero some stack memory, for example (this requires that %es has the same base as %ss).

I'd guess that the best solution is not to use any inline asm, but instead just use a global constant as a pointer to char. e.g.

// pointer is constant, but points to non-const memory
uint16_t *const vga_base = (uint16_t*)0xb8000;   // + whatever was in your segment

// offsets are scaled by 2.  Do some casting if you want the address math to treat offsets as byte offsets
void store_in_flat_memory(unsigned char c, uint32_t offset) {
  vga_base[offset] = 0x0400U | c;            // it matters that c is unsigned, so it zero-extends instead of sign-extending
}
    movzbl  4(%esp), %eax       # c, c
    movl    8(%esp), %edx       # offset, offset
    orb     $4, %ah   #, tmp95         # Super-weird, wtf gcc.  We get this even for -mtune=core2, where it causes a partial-register stall
    movw    %ax, 753664(%edx,%edx)  # tmp95, *_3   # the addressing mode scales the offset by two (sizeof(uint16_t)), by using it as base and index
    ret

From gcc6.1 on godbolt (link below), with -O3 -m32.

Without the const, code like vga_base[10] = 0x4 << 8 | 'A'; would have to load the vga_base global and then offset from it. With the const, &vga_base[10] is a compile-time constant.

If you really want a segment:

Since you can't leave %es modified, you need to save/restore it. This is another reason to avoid using it in the first place. If you really want a special segment for something, set up %fs or %gs once and leave them set, so it doesn't affect the normal operation of any instructions that don't use a segment override.

There is builtin syntax to use %fs or %gs without inline asm, for thread-local variables. You might be able to take advantage of it to avoid inline asm altogether

If you're using a custom segment, you could make it's base address non-zero, so you don't need to add a 0xb8000 yourself. However, Intel CPUs optimize for flat memory case, so address-generation using non-zero segment bases are a couple cycles slower, IIRC.

I did find a request for gcc to allow segment overrides without inline asm, and a question about adding segment support to gcc. Currently you can't do that.

Doing it manually in asm, with a dedicated segment

To look at the asm output, I put it on Godbolt with the -mx32 ABI, so args are passed in registers, but addresses don't need to be sign-extended to 64bits. (I wanted to avoid the noise of loading args from the stack for -m32 code. The -m32 asm for protected mode will look similar)

void store_in_special_segment(unsigned char c, uint32_t offset) {
    char *base = (char*)0xb8000;               // sizeof(char) = 1, so address math isn't scaled by anything

    // let the compiler do the address math at compile time, instead of forcing one 32bit constant into a register, and another into a disp32
    char *dst = base+offset;               // not a real address, because it's relative to a special segment.  We're using a C pointer so gcc can take advantage of whatever addressing mode it wants.
    uint16_t val = (uint32_t)c | 0x0400U;  // it matters that c is unsigned, so it zero-extends

    asm volatile ("movw  %[val], %%fs: %[dest]\n"
         : 
         : [val] "ri" (val),  // register or immediate
           [dest] "m" (*dst)
         : "memory"   // we write to something that isn't an output operand
    );
}
    movzbl  %dil, %edi        # dil is the low 8 of %edi (AMD64-only, but 32bit code prob. wouldn't put a char there in the first place)
    orw     $1024, %di        #, val   # gcc causes an LCP stall, even with -mtune=haswell, and with gcc 6.1
    movw  %di, %fs: 753664(%esi)    # val, *dst_2

void test_const_args(void) {
    uint32_t offset = (80 * 3 + 40) * 2;
    store_in_special_segment('B', offset);
}
    movw  $1090, %fs: 754224        #, MEM[(char *)754224B]

void test_const_offset(char ch) {
    uint32_t offset = (80 * 3 + 40) * 2;
    store_in_special_segment(ch, offset);
}
    movzbl  %dil, %edi  # ch, ch
    orw     $1024, %di        #, val
    movw  %di, %fs: 754224  # val, MEM[(char *)754224B]

void test_const_char(uint32_t offset) {
    store_in_special_segment('B', offset);
}
    movw  $1090, %fs: 753664(%edi)  #, *dst_4

So this code gets gcc to do an excellent job at using an addressing mode to do the address math, and do as much as possible at compile time.

Segment register

If you do want to modify a segment register for every store, keep in mind that it's slow: Agner Fog's insn tables stop including mov sr, r after Nehalem, but on Nehalem it's a 6 uop instruction that includes 3 load uops (from the GDT I assume). It has a throughput of one per 13 cycles. Reading a segment register is fine (e.g. push sr or mov r, sr). pop sr is even a bit slower.

I'm not even going to write code for this, because it's such a bad idea. Make sure you use clobber constraints to let the compiler know about every register you step on, or you will have hard-to-debug errors where surrounding code stops working.

See the x86 tag wiki for GNU C inline asm info.

His code *must* be running in real mode because he sets `es` to `0xb800` to write to the screen, right? Therefore there is no "flat" addressing, (yes I know about unreal mode, but it isn't that because he changes `es`) — doug65536, May 17 '16 at 05:23
@doug65536: In comments, the OP said `ES=DS=SS`, and he's setting it "in case it got changed". /facepalm. But no, the OP's code sets `%es = 0x10`, and puts `0xb800` into a register, and offsets that to. So it's doing what my code does: `%es: 0xb800 + offset`. — Peter Cordes, May 17 '16 at 05:26
Yes, that's right, `es` gets `0x10`... sorry forget my comment :) I got confused about comments. — doug65536, May 17 '16 at 05:28
The `es` set to `0x10` in my code is in protected mode, and the selector `0x10` base address is `0x00000000`, so I set the `ebx` to `0xb8000` to access the video memory, but it doesn't matter. — zhenguoli, May 17 '16 at 05:29
The code in the question looks so much like some startup code I did to do the real mode bootstrap that my brain clicked into real mode. That code set `es` though. — doug65536, May 17 '16 at 05:33
@doug65536: yeah, it looked like real-mode code to me, at first, too. Like maybe the OP copied a chunk of 16bit code into his 32bit code, and made a tiny change. — Peter Cordes, May 17 '16 at 05:36

Gcc inline assembly: what's wrong with the dynamic allocated register `r` in input operand?

1 Answers1

If you really want a segment:

Doing it manually in asm, with a dedicated segment

Segment register

Linked