4

This code:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };

const char myTable[] = { 1, 2, 3, 4 };

int keepPadding() {
  return (int)(&padding);
}

int foo() {
  return (int)(&myTable);  // <-- this is the part I'm looking at
}

compiles to the following assembly for the thumb instruction set (abbreviated for clarity). Note particularly the adds as the second instruction of foo:

...
foo:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    adds    r0, r0, #10
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0
    .size   foo, .-foo
    .align  1
    .global bar
    .syntax unified
    .code   16
    .thumb_func
    .type   bar, %function

...
myTable:
    .ascii  "\001\002\003\004"

It looks like it's loading a pointer (ldr) to the top of .rodata and then programmatically offsetting to the location of myTable (adds). But why not just load the address of the table itself directly?

Note: when I remove the const then it seems to do it without the ADDS instruction (with myTable in .data)

The context of the question is that I'm trying to hand-optimize some C firmware and noticed this adds instruction that seems to be superfluous, so I'm wondering if there's a way to restructure my code to get rid of it.

Note: this is all compiled for the ARM thumb instruction set as follows (using arm-none-eabi-gcc version 11.2.1):

arm-none-eabi-gcc -Os -c -mcpu=cortex-m0 -mthumb temp.c -S

Also note: the example code here is intended to represent a snippet of a larger codebase. If myTable were the only thing compiled then it lands at offset 0 in .rodata and the adds instruction disappears, but that is not the typcial case a real-world scenario. To represent the typical real-world scenario that produces this assembly, I added padding before the table.

See also here it's reproduced on Godbolt

Mike
  • 954
  • 6
  • 10
  • 1
    What gcc version? I couldn't get any on godbolt to produce your assembly, it's always a single `ldr`. – Jester Jun 02 '22 at 23:23
  • 1
    gcc could be trying to load a base address of all constants in the file and then add individual offsets to reduce the number of `ldr` instructions. But I'm not sure. – fuz Jun 03 '22 at 00:36
  • @fuz: And/or to reduce the number of address constants in literal pools, if multiple `ldr` instructions can share the same base address? Hmm, I wonder if a linker is going to rewrite some of that placeholder stuff, since it's interesting that `#10` is the same number as the `a:` offset of `myTable` within `.rodata`. – Peter Cordes Jun 03 '22 at 04:49
  • @PeterCordes It is possible. The linker people have been up to no good and keep adding more and more progressively ridiculous linker relaxiations with magic code patterns. – fuz Jun 03 '22 at 09:24
  • 1
    arm-none-eabi-gcc version 11.2.1. Full code example is here: https://gist.github.com/coder-mike/d2ccf6e5c9c1dfafec68c295cc82f8c7 – Mike Jun 03 '22 at 20:09
  • @fuz yes, it seems to be loading the base address of all the constants and then adding the offset. I can see why that would reduce the number of `ldr` instructions if there were multiple constants used within the same function. But here there is only one `ldr` either way, right? – Mike Jun 03 '22 at 20:20
  • @Mike I don't understand it either. Could you also provide the result of passing `-S` to the compiler invocation (to see what assembly the compiler generated)? Btw, `-d` is often more useful than `--disassemble-all` as it distinguishes between code and data, only disassembling code. – fuz Jun 03 '22 at 20:52
  • so with the keep padding and the padding I can get it to repeat this with 9.x.x with no problem. the adds is in the -S output the adds is there . obviously it is setting the base address to load from to be the start of padding and then adding the 10. – old_timer Jun 03 '22 at 23:30
  • -fno-section-anchors does make it go away. what/why/etc anchors. I dont know – old_timer Jun 03 '22 at 23:36
  • (it is generating an .LANCHOR at the start of the rodata and as a result needs to add 10). I wonder if this is to optimize constants the ldr instructions perhaps. maybe after linking if that address happens to be such that it can be loaded into r0 with a immediate instead of a pc relative. – old_timer Jun 03 '22 at 23:42
  • Please, edit your question to properly show the minimal example as your example does not work (external links are not useful, edit the question). – old_timer Jun 04 '22 at 18:38
  • arm thumb is not an architecture it is one of the many arm instruction sets (subsets) – old_timer Jun 04 '22 at 18:40
  • You should update the C in the code block in your question to actually reproduce the asm output you show, especially include the fact that it has 2 arrays, so `myTable[]` is not the first thing in the `.rodata` section. Minimal is good, but "complete" and "verifiable" are also important. It should be something readers can copy/paste into https://godbolt.org/ and see that asm output. (Also including a godbolt short or full link with the source and compiler options is good, but doesn't substitute for having at least the source in the question.) – Peter Cordes Jun 04 '22 at 21:34

1 Answers1

2

The question originally contained just this:

const char myTable[] = { 1, 2, 3, 4 };
int foo() {
  return (int)(&myTable);
}


arm-none-eabi-gcc -Os -c -mthumb so.c -o so.o
arm-none-eabi-objdump -D so.o

but it did not produce the adds:

Disassembly of section .text:

00000000 <foo>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <foo+0x4>)
   2:   4770        bx  lr
   4:   00000000    andeq   r0, r0, r0

Disassembly of section .rodata:

00000000 <myTable>:
   0:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff

The question has been edited to show a repeatable example, and this answer has been edited as a result. But I will just leave the answer to work toward the same solution. As maybe it is of interest that to get to the anchor took a few components to avoid the problem being optimized out.

So from your question and this:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int foo() {
  return (int)(&myTable);
}

It is obvious why myTable is at an offset of 10.

But padding is optimized out so you still end up with the same result.

So:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int keepPadding() {
  return (int)(&padding);
}
int foo() {
  return (int)(&myTable);
}

The name of that function implies you know all of this already and know what it took to make a minimum example, etc.

arm-none-eabi-gcc -Os -c -mthumb so.c -S


foo:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    adds    r0, r0, #10
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0
    .size   foo, .-foo
    .global myTable
    .global padding
    .section    .rodata
    .set    .LANCHOR0,. + 0
    .type   padding, %object
    .size   padding, 10
padding:
    .space  10
    .type   myTable, %object
    .size   myTable, 4
myTable:
    .ascii  "\001\002\003\004"
    .ident  "GCC: (GNU) 11.2.0"

It is generating an anchor then referencing from the anchor rather than directly to the label.

I suspect it is to allow for an optimization of the ldr. Let's try:

 arm-none-eabi-gcc -Os -c -mthumb -mcpu=cortex-m4 so.c -S

foo:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0+10
    .size   foo, .-foo

00000008 <foo>:
   8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
   a:   4770        bx  lr
   c:   0000000a    .word   0x0000000a

yeah, so that fixed it, but what about linking it

Disassembly of section .rodata:

00000000 <padding>:
    ...

0000000a <myTable>:
   a:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff

Disassembly of section .text:

00000010 <keepPadding>:
  10:   4800        ldr r0, [pc, #0]    ; (14 <keepPadding+0x4>)
  12:   4770        bx  lr
  14:   00000000    andeq   r0, r0, r0

00000018 <foo>:
  18:   4801        ldr r0, [pc, #4]    ; (20 <foo+0x8>)
  1a:   300a        adds    r0, #10
  1c:   4770        bx  lr
  1e:   46c0        nop         ; (mov r8, r8)
  20:   00000000    andeq   r0, r0, r0

Nope, was hoping that the linker would replace the pc-relative load and turn that into a mov r0,#0...Saving the load which is (might be) an optimization for systems that are not cortex-m (or even cortex-m).

Note: this also works

arm-none-eabi-gcc -Os -c -mthumb -fno-section-anchors so.c -o so.o

00000008 <foo>:
   8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
   a:   4770        bx  lr
   c:   00000000    andeq   r0, r0, r0
foo:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    bx  lr
.L6:
    .align  2
.L5:
    .word   myTable
    .size   foo, .-foo
    .global myTable
    .section    .rodata
    .type   myTable, %object
    .size   myTable, 4
myTable:
    .ascii  "\001\002\003\004"
    .global padding
    .type   padding, %object
    .size   padding, 10

The anchor was not used so the address of myTable was used directly.

From my perspective the "why" is because an anchor was used and the padding in front caused myTable to be an offset from the anchor. So the load loads the anchor address then adds gets you from the anchor to the table.

Why the anchor? Exercise for the reader, or someone else.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • Based on your github code you were 99% of the way there, or did you make it all the way and this was a test for us? – old_timer Jun 04 '22 at 18:42
  • Thanks for your feedback about my question. I thought carefully about whether to include the padding in the question or whether it would detract from the main point I was asking. In hindsight I probably made the wrong choice. Apologies. I don't know what makes you think I was testing anybody. My question was not hypothetical: I have real code that compiles with the `adds` instruction, and nothing in the gist indicates that I had any idea of how to get rid of it. And yes, I tried to get as far as I could on my own without asking for help and inconveniencing others, but I couldn't figure it out. – Mike Jun 04 '22 at 20:45
  • It's interesting that `-mcpu=cortex-m4` omits the `adds` instruction, but when I try `-mcpu=cortex-m0` it does NOT. – Mike Jun 04 '22 at 21:14
  • 1
    The `-fno-section-anchors` works for me. It reduces my [real code](https://github.com/coder-mike/microvium) from 8,436B to 8,424B (just 12B difference, lol). Probably because I don't have many tables. But the point is that it doesn't have some adverse effect on size overall. I'm curious why it's not the default for the compiler then. – Mike Jun 04 '22 at 21:28
  • 1
    This looks like a good answer, but the meta-advice in the post is not appropriate - remember that posts are for a wider readership, and thus complaints about the original post (or the original poster) are not ideal. The message here seemed a little exasperated too - if you are exhausted or irritated by helping people on Stack Overflow, then perhaps it is worth taking a break. – halfer Jun 05 '22 at 20:50
  • 1
    @halfer true. At the same time the question did not demonstrate the problem. I guess I could edit the question, but I do not think that is right the OP should have. And apparently has. and now maybe this answer needs a re-write... – old_timer Jun 06 '22 at 14:09
  • answer edited to remove comment. – old_timer Jun 06 '22 at 14:13
  • I got lucky with the cortex-m4 thing and didnt try m0. m0 is much more limited than m4 (armv6-m vs armv7-m) about 150 give or take thumb2 extensions. And the generic mthumb then is connected to whatever the default was when the compiler was built (armv4? armv7a?), which have a rich set of full sized instructions. So maybe the compiler authors have some generic, lets put anchors in if the instruction set is rich enough. It still does not make much sense though, and picking different architectures changing how works makes it even more baffling. – old_timer Jun 06 '22 at 14:15
  • Thus not only an exercise for the reader but a research project for the reader to understand the when does gcc use anchors question. – old_timer Jun 06 '22 at 14:17