TL;DR I'm playing around with easm and burned my fingers. Do my constraints make sense?
As I am playing around with memory, I wanted to test reading some memory manually on an ARM CPU (cortex A9)
(Disclaimer: Learning purpose here, of course I agree that relying on an optimizer is 99.999% of the time the right thing to do but I would really like to understand why everything explodes here).
On the concerned hardware:
- is the bus CPU - Memory 64 bits wide, so I'm trying to use the
ldrd
instruction to load two 32b words at once. - The data in memory is 128 bits aligned, so, let's use two times the
ldrd
instruction.
My problem is that, the generated assembler / generation attempt does not make sense, and this independently from:
- Compiler (tested with GCC and clang)
- Optimization level (tested with -O0 -Og -O2 -O3)
- Cross / native (tested with arm-linux-gnueabihf-gcc and native gcc)
Here is a minimal example demonstrating the issue:
#include <stdint.h>
// custom structure: represent 128 bits
typedef struct __attribute__ ((packed)) u128
{
uint32_t a;
uint32_t b;
uint32_t c;
uint32_t d;
} u128;
int main(void)
{
uint32_t *ptr = (uint32_t*) 0xdeadbeef; // For test purpose, just a random location in memory
u128 words;
// 1st read: 64 bits
asm volatile inline (
"ldrd %[high_32b], %[low_32b], [%[addr]], #8"
: [high_32b] "=X" (words.a), [low_32b] "=X" (words.b)
: [addr] "r" (ptr));
// 2nd read: 64 bits
asm volatile inline (
"ldrd %[high_32b], %[low_32b], [%[addr]], #8"
: [high_32b] "=X" (words.c), [low_32b] "=X" (words.d)
: [addr] "r" (ptr));
return 0;
}
GCC
arm-linux-gnueabihf-gcc -Wall -Wextra -O3 -g -ggdb broken_asm.c -o broken_asm /tmp/ccIaxiTz.s: Assembler messages: /tmp/ccIaxiTz.s:51: Warning: base register written back, and overlaps one of transfer registers
disassembly (radare2 -A -c 's sym.main; pdf' broken_asm
)
│ 0x000003da f3e80221 ldrd r2, r1, [r3], 8
| 0x000003de f3e80232 ldrd r3, r2, [r3], 8 ; broken_asm.c:27 asm volatile inline (
So, yes indeed, the warning makes sense: The ldrd r3, r2, [r3], 8
seems broken
(expected: sources != destination. For instance: ldrd r3, r2, [r4], 8
)
Clang
clang -mtune=cortex-a9 --target=arm-linux-gnueabihf -isystem /usr/arm-linux-gnueabihf/include -Wall -Wextra -O3 -g -ggdb broken_asm.c -o broken_asm
broken_asm.c:22:5: error: Rt must be even-numbered "ldrd %[high_32b], %[low_32b], [%[addr]], #8" ^ :1:11: note: instantiated into assembly here ldrd r1, r2, [r0], #8 ^ broken_asm.c:28:5: error: base register needs to be different from destination registers "ldrd %[high_32b], %[low_32b], [%[addr]], #8" ^ :1:11: note: instantiated into assembly here ldrd r0, r1, [r0], #8 ^ 2 errors generated.
So, let's read some error messages:
base register needs to be different from destination registers
OK, comparable issue as with GCC (and yes, it more feel like an error than a warning)
error: Rt must be even-numbered
Wait what? ldrd r1, r2 ...
The first operand must, indeed be an even register and the second one, the following odd register.
From the ARM Instructions Reference:
Rt: The first destination register. For an ARM instruction, must be even-numbered and not R14.
Rt2: The second destination register. For an ARM instruction, must be <R(t+1)>.
I am pretty sure I'm doing something in EASM wrong (as it's actually nearly the only effective lines of code, it's not so hard to guess).
Here is my constraints understanding so far:
Output:
The registers if which I would like the output are, as far as I understand, write only.
‘=’ identifies an operand which is only written
I started with "g" as a constraint (same effect) but opted for "X" to give the might compiler more freedom:
'X' Any operand whatsoever is allowed.
Input:
I'm using "r" as I would like in both ldrd
to read from the same register.
I also tried with "X" but got the same issue.
'r' A register operand is allowed provided that it is in a general register.
Some notes as this post is too short :/
- Host: Linux (Debian)
- Target: Zynq 7000 (PS side: Cortex A9)
- Clang --version: Debian clang version 11.0.1-2
- cross gcc: arm-linux-gnueabihf-gcc (Debian 10.2.1-6) 10.2.1 20210110
- native gcc: gcc (Debian 10.2.1-6) 10.2.1 20210110
- Tweaking a binary to manually set registers in op-codes seems to work as intended
So, I genuinely have no idea what I'm doing wrong here. Any pointer welcomed.