How do I get GCC to not delete reads from volatile pointers?

Question

I have the following code in my source file:

void *hardware = AllocateHardwareArea(SIZE);
volatile uint32_t *reader = (uint32_t *) hardware;
unsigned x;
for (x = 0; x < SIZE / sizeof(u32); ++x)
    (void) *reader++;
ReleaseHardwareArea(hardware);

But when I compile this on ARMv6-targeted GCC 4.9.2 with -O3, the compiler is deleting the entire for loop from the assembly language output:

STMFD   SP!, {R3,LR}
MOV     R0, #0
MOV     R1, #0x10000
BL      AllocateHardwareArea
LDMFD   SP!, {R3,LR}
B       ReleaseHardwareArea

Isn't volatile supposed to be for hardware register situations such as this?

Pay heed to warnings! You should get one for casting away the qualifier. — too honest for this site, Sep 28 '15 at 01:22
If @Olaf's comment doesn't address it, what happens if you add the option to turn off optimization or lower the optimization level? — clearlight, Sep 28 '15 at 01:39
[gcc 5.2.0 x86](http://goo.gl/IwIfBL) does not optimize it out (compare the code with or without the `*` operator). Perhaps a compiler bug, although to be sure please try to create a [MCVE](http://stackoverflow.com/help/mcve). — M.M, Sep 28 '15 at 01:54
@Olaf No qualifier is cast away in this code, unless you're talking about `(void)` but that is legal and should not generate a warning — M.M, Sep 28 '15 at 01:55
Show the code (or just function signature if you do not have the code) for `AllocateHardwareArea`. You could try having that function return `volatile void *`. — M.M, Sep 28 '15 at 03:01
Possibly [related info](http://stackoverflow.com/questions/28654418/requirements-for-behavior-of-pointer-to-volatile-pointing-to-non-volatile-object) — M.M, Sep 28 '15 at 03:01
@M.M: It is legal, but does cast away the `volatile` qualifier for the object `reader` points to (`reader` itself is **not** volatile). Thus the compiler is free just to optimize-out the loop if `reader` is not used elsewhere (even then it can just add `SIZE` to it). And, yes, the compiler **should** (I did not write "shall") generate a warning - perhaps this is enabled only with an option (the typical problem: one should always enable all warnings). — too honest for this site, Sep 28 '15 at 12:05
@Olaf `(void)` does not allow the compiler to optimize out the read . That sequence means that the read is performed and then the result is converted to `void`. The read may not be optimized out because it was read of a volatile object. — M.M, Sep 28 '15 at 12:28
@M.M: Point taken, thank you! I was on the wrong track. So it's leaving `SIZE` to be the suspect. — too honest for this site, Sep 28 '15 at 12:33
@Olaf nobody has been able to reproduce it yet, so it's hard to give anything concrete. I'm pretty sure it would be a compiler bug if the reads were optimized out (but see the related info I linked just to make the waters muddy). I think we need to wait for MCVE. — M.M, Sep 28 '15 at 13:12
Does rewriting the loop body as `{ *reader; ++reader; }` change anything? I'm wondering if the result of `reader++` is not getting volatile qualified, so the dereference gets removed. — Jonathan Wakely, Sep 28 '15 at 13:34
Hey, @Myria, please tell us what `SIZE` value you used for the test. We believe it was less than 4, in which case the loop body is never executed, and optimizing it away is expected. It is **very** rude behaviour to let others hanging like this, without any updates. Remember [Wheaton's Law](http://knowyourmeme.com/memes/wheatons-law)! — Nominal Animal, Sep 29 '15 at 14:53
I'm voting to close this question as off-topic because the problem was nothing to do with the compiler or other issues presented here. I had a bug that caused the "for" loop to be of zero length. — Myria, Oct 08 '15 at 00:08

Nominal Animal · Accepted Answer · 2015-09-28T12:33:20.137

I cannot replicate your results using GCC-4.9.3 (gcc-arm-none-eabi-4.9.3.2015q2-1trusty1 from Terry Guo's PPA for Ubuntu 14.04.2 LTS on x86_64). Starting with file.c,

void *AllocateHardwareArea(const unsigned int);
void  ReleaseHardwareArea(void *);

void test(const unsigned int size)
{
    void *hardware = AllocateHardwareArea(size);
    volatile unsigned int *reader = hardware;
    unsigned int x;

    for (x = 0; x < size / sizeof *reader; x++)
        (void)*reader++;

    ReleaseHardwareArea(hardware);
}

using arm-none-eabi-gcc-4.9.3 -march=armv6 -mtune=arm6 -O3 -S file.c compiles to the following assembly:

        .arch armv6
        .fpu softvfp
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "file.c"
        .text
        .align  2
        .global test
        .type   test, %function
test:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        stmfd   sp!, {r4, lr}
        mov     r4, r0
        bl      AllocateHardwareArea
        movs    r2, r4, lsr #2
        beq     .L2
        mov     r3, r0
        add     r2, r0, r2, asl #2
.L3:
        ldr     r1, [r3]
        add     r3, r3, #4
        cmp     r3, r2
        bne     .L3
.L2:
        ldmfd   sp!, {r4, lr}
        b       ReleaseHardwareArea
        .size   test, .-test

or, compiled to object code using arm-none-eabi-gcc-4.9.3 -march=armv6 -mtune=arm6 -O3 -c file.c, the disassembly using arm-none-eabi-objdump -d file.o is

file.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <test>:
   0:   e92d4010        push    {r4, lr}
   4:   e1a04000        mov     r4, r0
   8:   ebfffffe        bl      0 <AllocateHardwareArea>
   c:   e1b02124        lsrs    r2, r4, #2
  10:   0a000005        beq     2c <test+0x2c>
  14:   e1a03000        mov     r3, r0
  18:   e0802102        add     r2, r0, r2, lsl #2
  1c:   e5931000        ldr     r1, [r3]
  20:   e2833004        add     r3, r3, #4
  24:   e1530002        cmp     r3, r2
  28:   1afffffb        bne     1c <test+0x1c>
  2c:   e8bd4010        pop     {r4, lr}
  30:   eafffffe        b       0 <ReleaseHardwareArea>

The allocated area is read, as it should be, in unsigned int-sized units. In the assembly source, the read loop is between labels .L3 and .L2. In the object code, the read loop is at 1c..28.

Edited to add: Olaf pointed out in a comment, that OP might use a constant size. Let's examine that case, too:

void *AllocateHardwareArea(const unsigned int);
void  ReleaseHardwareArea(void *);

#define SIZE 32

void test(void)
{
    void *hardware = AllocateHardwareArea(SIZE);
    volatile unsigned int *reader = hardware;
    unsigned int x;

    for (x = 0; x < SIZE / sizeof *reader; x++)
        (void)*reader++;

    ReleaseHardwareArea(hardware);
}

The assembly is

    .arch armv6
    .fpu softvfp
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 1
    .eabi_attribute 30, 2
    .eabi_attribute 34, 1
    .eabi_attribute 18, 4
    .file   "file2.c"
    .text
    .align  2
    .global test
    .type   test, %function
test:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    stmfd   sp!, {r3, lr}
    mov     r0, #32
    bl      AllocateHardwareArea
    mov     r3, r0
    ldr     r2, [r0]
    ldr     r2, [r0, #4]
    ldr     r2, [r0, #8]
    ldr     r2, [r0, #12]
    ldr     r2, [r0, #16]
    ldr     r2, [r0, #20]
    ldr     r2, [r0, #24]
    ldr     r3, [r3, #28]
    ldmfd   sp!, {r3, lr}
    b       ReleaseHardwareArea
    .size   test, .-test
    .ident  "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]"

and the disassembly of the object code

00000000 <test>:
   0:   e92d4008        push    {r3, lr}
   4:   e3a00020        mov     r0, #32
   8:   ebfffffe        bl      0 <AllocateHardwareArea>
   c:   e1a03000        mov     r3, r0
  10:   e5902000        ldr     r2, [r0]
  14:   e5902004        ldr     r2, [r0, #4]
  18:   e5902008        ldr     r2, [r0, #8]
  1c:   e590200c        ldr     r2, [r0, #12]
  20:   e5902010        ldr     r2, [r0, #16]
  24:   e5902014        ldr     r2, [r0, #20]
  28:   e5902018        ldr     r2, [r0, #24]
  2c:   e593301c        ldr     r3, [r3, #28]
  30:   e8bd4008        pop     {r3, lr}
  34:   eafffffe        b       0 <ReleaseHardwareArea>

i.e. the loop is simply unrolled. Of course, if SIZE is less than 4, then the loop is optimized away. Unrolling occurs for SIZE <= 71. For SIZE = 72, the object code is

00000000 <test>:
   0:   e92d4008        push    {r3, lr}
   4:   e3a00048        mov     r0, #72 ; 0x48
   8:   ebfffffe        bl      0 <AllocateHardwareArea>
   c:   e1a03000        mov     r3, r0
  10:   e2802048        add     r2, r0, #72     ; 0x48
  14:   e5931000        ldr     r1, [r3]
  18:   e2833004        add     r3, r3, #4
  1c:   e1530002        cmp     r3, r2
  20:   1afffffb        bne     14 <test+0x14>
  24:   e8bd4008        pop     {r3, lr}
  28:   eafffffe        b       0 <ReleaseHardwareArea>

Since you are compiling with extreme optimizations (-O3), I recommend rewriting your code snippet, sprinkling const liberally, instead of assuming the compiler detects const-ness automatically. For example, using the same commands as above, the following version

void *AllocateHardwareArea(const unsigned int);
void  ReleaseHardwareArea(void *);

void test(const unsigned int size)
{
    void *const hardware = AllocateHardwareArea(size);
    volatile unsigned int *const reader = hardware;
    const unsigned int n = size / sizeof *reader;
    unsigned int i;

    for (i = 0; i < n; i++)
        reader[i];

    ReleaseHardwareArea(hardware);
}

performs the exact same task, but with one fewer instruction within the inner loop. The assembly is

        .arch armv6
        .fpu softvfp
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "new.c"
        .text
        .align  2
        .global test
        .type   test, %function
test:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        stmfd   sp!, {r4, lr}
        mov     r4, r0
        bl      AllocateHardwareArea
        movs    r2, r4, lsr #2
        beq     .L2
        mov     r3, r0
        add     r2, r0, r2, asl #2
.L3:
        ldr     r1, [r3], #4
        cmp     r3, r2
        bne     .L3
.L2:
        ldmfd   sp!, {r4, lr}
        b       ReleaseHardwareArea
        .size   test, .-test
        .ident  "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]"

and the object code

Disassembly of section .text:

00000000 <test>:
   0:   e92d4010        push    {r4, lr}
   4:   e1a04000        mov     r4, r0
   8:   ebfffffe        bl      0 <AllocateHardwareArea>
   c:   e1b02124        lsrs    r2, r4, #2
  10:   0a000004        beq     28 <test+0x28>
  14:   e1a03000        mov     r3, r0
  18:   e0802102        add     r2, r0, r2, lsl #2
  1c:   e4931004        ldr     r1, [r3], #4
  20:   e1530002        cmp     r3, r2
  24:   1afffffc        bne     1c <test+0x1c>
  28:   e8bd4010        pop     {r4, lr}
  2c:   eafffffe        b       0 <ReleaseHardwareArea>

Perhaps you could test if your GCC compiles this latter version correctly? If not, we have a compiler bug at hand (assuming SIZE is at least 4), possibly/likely already fixed in later versions.

OP uses `SIZE` of unknow origin. Your code uses a function argument, so the compiler cannot optimize-out the loop, as it does not know the value of `size`. However, It`s still not clear to me if the `(void) cast is not the problem, removing the `volatile` qualifier. Could you point me at wwhy you think it does not? — too honest for this site, Sep 28 '15 at 12:17
Regarding the edit: My point is what if `SIZE < 4`? (the part about `(void)` has been clarified by @M.M.) Hmm, this is quite an issue for Flash-usage if it unrolls the loop for up to 17 transfers. I had not thought it would do that much. I'll keep that in mind. — too honest for this site, Sep 28 '15 at 12:36
@Olaf: I added the case where `SIZE` is a preprocessor macro. The loop is only optimized away iff `SIZE < 4` -- in which case no reads should be done anyway, using OP's logic. I thought that was obvious, because of how OP wrote the `for` loop (the loop condition is never fulfilled if `SIZE < 4`, so the loop body is never run in that case). It might not be obvious to OP, however. — Nominal Animal, Sep 28 '15 at 12:38
" It might not be obvious to OP" That is exactly my point. And the reason asking for complete information. — too honest for this site, Sep 28 '15 at 12:40
@Olaf: You're absolutely right. I *assumed* from the wording of the question that the OP knows what they are doing, and has verified that this behaviour only occurs with `-O3`. Very bad assumption from me. Do you think I should delete this answer, in the hopes that the OP clarifies `SIZE` (and perhaps also whether the results are only seen with `-O3`)? It might be worth it; I do **not** want to "reward" ill-formed/vague questions. — Nominal Animal, Sep 28 '15 at 12:43
Do not! It is a good answer and you do cover the subject. If OP clarifies, you can edit later. If she doesn't, you still can show both variants. — too honest for this site, Sep 28 '15 at 13:02
Unrelated: is there a reason you do not use the ARM launpad gcc? — too honest for this site, Sep 28 '15 at 13:03
upvoting as this contains useful info, even though it doesn't answer the question! — M.M, Sep 28 '15 at 13:13
@Olaf: Terry Guo's PPA does use [GCC ARM Embedded](https://launchpad.net/gcc-arm-embedded/) Launchpad sources. The non-PPA `gcc-arm-none-eabi` is from Debian, and is an older version (see Debian package update policies), and I don't want that. — Nominal Animal, Sep 28 '15 at 13:43
I read that, but I wonder why one should use Terry's and not just the original then. So there seems to be no advantage in that compared to the original. (not talking about the Debian - I know Debian well enough) — too honest for this site, Sep 28 '15 at 13:50
@Olaf: The PPA provides .deb packages (which is important for my use case), whereas the GCC ARM Embedded at Launchpad only provides "installers". — Nominal Animal, Sep 28 '15 at 22:56
@NominalAnimal This was a PEBKAC situation. The problem had nothing to do with compiler code generation. The entire problem was that, translated to the example's identifiers, `SIZE` was zero. I don't know how to close this topic as invalid. Sorry everyone =( — Myria, Oct 08 '15 at 00:06

How do I get GCC to not delete reads from volatile pointers?

1 Answers1