There are several issues with the code:
Issue 1: Wrong Constraint
The correct constraint for a call target is "i"
, thus known at link-time.
Issue 2: Wrong % print-modifier
In order to print an address suitable for a call, use %x
which will print a plain symbol without gs()
. Generating a linker stub at this place by means of gs()
is not valid syntax, hence "garbage at end of line". Apart from that, as you are calling bar
directly, there is no need for linker stub (at least not for this kind of symbol usage).
Issue 3: call
instruction might not be available
To factor out whether a device supports call
or just rcall
, there is %~
which prints a single r
if just rcall
is available, and nothing if call
is available.
Issue 4: The Call might clobber Registers or have other Side-Effects
It's unlikely that the call has no effects on registers or on memory whatsoever. If you description of the inline asm does not match some side-effects of the code, it's likely that you will get wrong code sooner or later.
Taking it all together
Let's assume you have a function bar
written in assembly that takes two 16-bit operands in R22 and R26, and computes a result in R22. This function does not obey the avr-gcc C/C++ calling convention, so inline assembly is one way to interface to such a function. For bar
we cannot write a correct prototype anyways, so we just provide a prototype so that we can use symbol bar
. Register X has constraint "x"
, but R22 has no own register constraint, and therefore we have to use a local asm register:
extern "C" void bar (...);
int call_bar (int x, int y)
{
register int r22 __asm ("r22") = x;
__asm ("%~call %x2"
: "+r" (r22)
: "x" (y), "i" (bar));
return r22;
}
Generated code for ATmega32 + optimization:
_Z8call_barii:
movw r26,r22
movw r22,r24
call bar
movw r24,r22
ret
So what's that "generate stub" gs()
thing?
Suppose the C/C++ code is taking the address of a function. The only sensible thing to do with it is to call that function, which will be an indirect call in general. Now an indirect call can target 64KiW = 128KiB at most, so that on devices with > 128KiB of code memory, special means must be taken to indirectly call a function beyond the 128KiB boundary. The AVR hardware features an SFR named EIND
for that purpose, but problems using it are obvious. You'd have to set it prior to a call and then reset it somehow somewhere; all evil things would be necessary.
avr-gcc takes a different approach: For each such address taken, the compiler generates gs(func)
. This will just resolve to func
if the address is in the 128KiB range. If not, gs()
resolves to an address in section .trampolines
which is located close to the beginning of flash, i.e. in the lower 128KiB. .trampolines
containts a list of direct JMP
s to targets beyond the 128KiB range.
Take for example the following C code:
extern int far_func (void);
int main (void)
{
int (*pfunc)(void) = far_func;
__asm ("" : "+r" (pfunc)); /* Forget content of pfunc. */
return pfunc();
}
The __asm is used to keep the compiler from optimizing the indirect call to a direct one. Then run
> avr-gcc main.c -o main.elf -mmcu=atmega2560 -save-temps -Os -Wl,--defsym,far_func=0x24680
> avr-objdump -d main.elf > main.lst
For the matter of brevity, we just define symbol far_func
per command line.
The assembly dump in main.s
shows that far_func
might require a linker stub:
main:
ldi r30,lo8(gs(far_func))
ldi r31,hi8(gs(far_func))
eijmp
The final executable listing in main.lst
then shows that the stub is actually generated and used:
main.elf: file format elf32-avr
Disassembly of section .text:
...
000000e4 <__trampolines_start>:
e4: 0d 94 40 23 jmp 0x24680 ; 0x24680 <far_func>
...
00000104 <main>:
104: e2 e7 ldi r30, 0x72 ; 114
106: f0 e0 ldi r31, 0x00 ; 0
108: 19 94 eijmp
main loads Z=0x0072 which is a word address for byte address 0x00e4, i.e. the code is indirectly jumping to 0x00e4, and from there it jumps directly to 0x24680.