0

I need to implement a small fragment of code in assembly for an 32 bit AVR (memory test testing RAM under the running C program, no other way to solve it), however I can't find any documentation on the AVR-32 specifics of inline assembler, and trial and error neither led me to success.

First thing: Does anyone know about any docs describing the AVR-32 specifics of inline ASM? (Particularly the input / output register specifications)

I managed to get to the point where I was able to write the inline fragment with automatic input / output register allocation, however a strange behavior prevents me to complete it. Take the following fragment of code:

int ret;
int ad0;
int ad1;
/* ... */
__asm__ volatile(
 "mov     r2,  %0  \n\t"
 "mov     r3,  %1  \n\t"
 "mov     %2,  0   \n\t"
: "=r"(ret)
: "r"(ad0), "r"(ad1)
: "r2", "r3"
);

Compiled with optimizations off using avr32-gcc it produces the following assembly output (-S):

#APP
#  95 "svramt.c" 1
 mov     r2,  r8  
 mov     r3,  r8  
 mov     r9,  0   

#  0 "" 2
#NO_APP

Notice how %0 and %1 mapped to the same register (r8). This seems to happen if an output register is present. To check whether I used inline assembly improperly here, I also tried X86 with the native gcc on the host:

int ret;
int ad0;
int ad1;
/* ... */
__asm__ volatile(
 "mov     %0,    %%eax \n\t"
 "mov     %1,    %%ebx \n\t"
 "mov     0,     %2,   \n\t"
: "=r"(ret)
: "r"(ad0), "r"(ad1)
: "eax", "ebx"
);

This fragment compiles to:

#APP
# 7 "asmtest.c" 1
 mov     %esi,    %eax 
 mov     %edx,    %ebx 
 mov     0,     %ecx,   

# 0 "" 2
#NO_APP

This is what I expected to happen with the AVR-32 counterpart, all inputs and outputs mapping to different registers.

I would have liked to work around the problem (if it is a bug in avr32-gcc) by specifying the registers directly (trying "=r8" and such as input / output specs), but it doesn't compile that way.

If there is no documentation, does anyone know where the register specs for inline asm could be found in (a "normal" x86 or ARM) GCC's source? It would worth a try, but GCC is a huge beast to wade through without any prior knowledge.

I don't believe I have enough karma to get this done with an ASM module (with near zero knowledge of AVR-32 assembly), and that would at least need the documentation of the calling convention anyway which I neither found so far...

EDIT: Further experimenting shown that using =&r for output specifier seems to solve the register mapping problem. Why so, I am clueless (two inputs mapped to the same register). At least the thing may be tested since it now produces the intended assembly fragment.

Further research revealed this AVR 8 bit document which offers part of a solution by describing square brackets for providing names for the operands which names can be used in the assembly fragment. This eliminates probable ambiguity between which operand would map to which %n specification in the fragment. (I couldn't see this syntax described in other documents, but works in avr32-gcc as well, so was useful)

emacs drives me nuts
  • 2,785
  • 13
  • 23
Jubatian
  • 2,171
  • 16
  • 22
  • The compiler didn't map two inputs to the same register, it mapped an input and an output to the same one and that is legal. To avoid that you need to use the early clobber `&` as you have found out. Inline asm is [documented in the manual](https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html) with the AVR specific stuff listed in the [machine constraints section](https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html). – Jester Sep 21 '15 at 11:01
  • @Jester: Sadly the AVR-**32** port is maintained by Atmel only, and so its specifics are not listed in that document. It only lists 8 bit AVR which is useless for AVR-32. The mapping was also something odd, in a lot of examples it seemed like '%0' here would be bound with the first input, not the output like your note suggests (I wanted to use specific register requests also to get around this, to make the binding clearly visible, not something which could be screwed up later. Speed is not a concern here, just to make the thing work reliably). – Jubatian Sep 21 '15 at 11:21
  • Did you ask Atmel? At least they have to provide the source code (thanks GPL!), so there should be a (laborious, but possible) way to find out. Personal note: if Atmel does not provide the data, I'd seriously think about changing to a different MCU. The AVR32 are legacy already (Atmel pushes ARM Cortex-M, as most other vendors), so you will have to some day anyway. – too honest for this site Sep 21 '15 at 11:26
  • @Olaf: Ouch... I have some not so nice experiences with them by a processor bug (http://www.avrfreaks.net/forum/working-around-short-interrupt-instruction-skip-bug) which took three months to find and never got any response from them. The alternatives are neither too nice (Microchip with their optimize-for-mega-$$$ GCC variant, http://jubatian.com/articles/turning-on-optimizations-in-microchips-xc32/), maybe really ARM is the way to go, which the company didn't want when introducing 32 bits stuff. So now we have AVR-32s and some PIC32s... Great. – Jubatian Sep 21 '15 at 11:31
  • I did not say Atmel would be helpful (I actually made similar experiences with them - reason why I currenlty not recommending their products to customers; Atmel has been quite arrogant in the past; not sure if they still are). Note that ARM is not ARM - even more if it comes to MCUs. Different vendors use quite different (and varying performant) approaches for peripherals. – too honest for this site Sep 21 '15 at 11:44
  • Last posting, as this is off-topic: "with ARM, this feels impossible" Huh? The only libs you need are the CMSIS headers from ARM/vendor and the peripheral register definitions header(s). For most vendors there is no problem to get them. You might a some missconception on what ARM actually does and how they are related to the MCU/SoC vendors. One thing often ignored is that ARM-MCUs from different vendors cannot be simply used (alternate source) or the software easily be ported from one vendor's MCU to another. "Same CPU" is far from "same peripherals" or "same drivers". – too honest for this site Sep 21 '15 at 11:53
  • You really should avoid the vendor-specific "HAL" libraries (ST has one of the worst examples for this). I actually prefer to write my own low-level drivers just using the datasheet/ref-man and get along with the STM32 quite well (except for the usual problems). Just do not even **try** to use the STlib rubbish. – too honest for this site Sep 21 '15 at 12:06

1 Answers1

2

Take the following fragment of code:

    __asm__ volatile(
     "mov     r2,  %0  \n\t"
     "mov     r3,  %1  \n\t"
     "mov     %2,  0   \n\t"
    : "=r"(ret)
    : "r"(ad0), "r"(ad1)
    : "r2", "r3"
    );

Notice how %0 and %1 mapped to the same register (r8).

From the description of the operands, that is completely fine: we don't see the code which follows the asm, but obviously the lifetimes of ad0 and ad1 are ending at the asm, and the lifetime of ret starts.

This means gcc may allocate registers in such a way that the output overlap(s) the input(s). If you start writing the outputs before all inputs have been read, then — depending on register choice — you might override (clobber) the input(s). Such a situation is called early-clobber, and in order to tell the compiler that it might occur, there is constraint modifier & "early clobber" for output-operands.

Apart from that, you described op0 as output and op1 and op2 as inputs1, but the asm template uses op0 and op1 as inputs and op2 as output. Hence, operands which match the actualy assembly code are:

    __asm__ volatile(
     "mov     r2,  %1  \n\t"
     "mov     r3,  %2  \n\t"
     "mov     %0,  0   \n\t"
    : "=r" (ret)           /* output(s) */ 
    : "r" (ad0), "r" (ad1) /* input(s) */
    : "r2", "r3"
    );

Notice that no rearly-clobber specifier is needed because you consumed all the inputs before writing the output, and it's fine if reg overlaps ad0 or ad1.

Further experimenting shown that using =&r for output specifier seems to solve the register mapping problem. Why so, I am clueless

Describing the output operand as "early-clobber" means that the compiler must use a fresh register for the output operand so that it does not overlap any input operand. In your original code, you used inputs in place of outputs etc, but the =&r for an output that's acually an input would still fix the early-clobber situation (but the inline asm would still be incorrect).

I also tried X86 with the native gcc on the host [...] This fragment compiles to: [...] This is what I expected to happen with the AVR-32 counterpart,

Just the fact that the assembly code generated by some inline asm (or any other code for that matter) is as you expect, does not imply the inline asm is correct in the first place. Incorrect inline assembly (or other code for that matter) might still generate the expected assembly output. You can't use this to prove the code is correct. Only when the generated code is not as expected you know the sources are wrong (or there is the rare case of a tool bug).

I would have liked to work around the problem [...] by specifying the registers directly (trying "=r8" and such as input / output specs), but it doesn't compile that way.

The constraint strings must contain: constraints. "=r8" means "an output operand that's in register class "r" or matches operand number 8". This makes no sense. What you can do is to use local register variables like below, but it's likely this is not what you want:

    register int reg8 asm ("r8") = 42;
    int ret;
    asm  ("mov %0, %1" : "=r" (ret) : "r" (reg8));

This assumes r8 is a valid register name for the target. Alternatively, you can use the register number like in reg8 asm ("8") provided the register number of r8 8.


1I am not familiar with AVR32 asm at all, so I assumed the semantics of mov instruction is mov dest, source.

emacs drives me nuts
  • 2,785
  • 13
  • 23
  • 1
    I like to link to the docs when explaining things. It adds an air of legitimacy, as well as providing some additional information about the feature under discussion (as well as showing related features). Early clobber is [here](https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html) (although your explanation is more detailed) and local register variables are [here](https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html). I guess technically these aren't "the docs" since atmel kinda goes their own way, but they all have the same roots, which means the links are probably still helpful. – David Wohlferd Oct 23 '22 at 06:01
  • Ow! Added an upvote on it, but was so long ago that the MCU I was working on by now is left behind in a different country! Wouldn't be able to check this any more. Seems like I mixed up association order then? (the "=r" output gets assigned %0, not the first input) – Jubatian Jan 14 '23 at 21:44