4

I want to use volatile bit-field struct to set hardware register like following code

union foo {
    uint32_t value;
    struct {
        uint32_t x : 1;
        uint32_t y : 3;
        uint32_t z : 28;
    };
};
union foo f = {0};
int main()
{
    volatile union foo *f_ptr = &f;
    //union foo tmp;
    *f_ptr =  (union foo) {
        .x = 1,
        .y = 7,
        .z = 10,
    };
    //*f_ptr = tmp;
    return 0;
}

However, the compiler will make it to STR, LDR HW register several times. It is a terrible things that it will trigger hardware to work at once when the register is writed.

main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    movw    r3, #:lower16:.LANCHOR0
    movs    r0, #0
    movt    r3, #:upper16:.LANCHOR0
    ldr r2, [r3]
    orr r2, r2, #1
    str r2, [r3]
    ldr r2, [r3]
    orr r2, r2, #14
    str r2, [r3]
    ldr r2, [r3]
    and r2, r2, #15
    orr r2, r2, #160
    str r2, [r3]
    bx  lr
    .size   main, .-main
    .global f
    .bss
    .align  2

My gcc version is : arm-linux-gnueabi-gcc (Linaro GCC 4.9-2017.01) 4.9.4 and build with -O2 optimation


I have tried to use the local variable to resolve this problem

union foo {
    uint32_t value;
    struct {
        uint32_t x : 1;
        uint32_t y : 3;
        uint32_t z : 28;
    };
};
union foo f = {0};
int main()
{
    volatile union foo *f_ptr = &f;
    union foo tmp;
    tmp =  (union foo) {
        .x = 1,
        .y = 7,
        .z = 10,
    };
    *f_ptr = tmp;
    return 0;
}

Well, it will not STR to HW register several times

main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    movs    r1, #10
    movs    r2, #15
    movw    r3, #:lower16:.LANCHOR0
    bfi r2, r1, #4, #28
    movt    r3, #:upper16:.LANCHOR0
    movs    r0, #0
    str r2, [r3]
    bx  lr
    .size   main, .-main
    .global f
    .bss
    .align  2

I think it is still not a good idea to use local variable, considering the limitation of binary size for embedded system.

Is there any way to handle this problem without using local variable?


old cat
  • 43
  • 4
  • 1
    On an unrelated note, if `f` is of type `union foo` then `&f` will be of type `union foo *`. That means such a cast isn't needed. – Some programmer dude Jan 06 '22 at 10:10
  • @Someprogrammerdude Yes, you're write. It is not necessary to do cast for `&f` – old cat Jan 06 '22 at 10:15
  • So you are basically saying that optimization is not performed properly and store/load register operation happens a lot of times? – kiner_shah Jan 06 '22 at 10:35
  • @kiner_shah yes, I want it to store or load register ine time. But I am not usre it is caused by optimization or it is just the role of gcc spec – old cat Jan 06 '22 at 12:24
  • Is the assembly you're reading generated with compiler optimizations enabled (`-O3`)? – John Bollinger Jan 06 '22 at 12:58
  • 2
    note you should never misuse unions like and also never use unions/structs across compile domains. while this is a fun exercise you should never do this in practice as you are well into implementation defined territory for the language and it will fail to work eventually. – old_timer Jan 06 '22 at 12:58
  • @old_timer Can you explain for more details? Is there any misuse of the sample code – old cat Jan 06 '22 at 13:06
  • historically we have known that unions share the same memory space (the purpose is back in the 60s and 70s memory was scarce, we dont have this problem today). folks have often misused the union by assuming the alignment of the items in the union because of the it works on my machine thing. my machine my compiler it works therefore this is how the language works, which is very much not true for C. newer specs have attempted to help with this but a bit field uint32_t and uint32_t not for a bitfield definition are shown as different in the spec, making this implementation defined – old_timer Jan 06 '22 at 13:59
  • bitfielts themselves have always been implementation defined and even decades ago was said you should never use them. – old_timer Jan 06 '22 at 14:00
  • basically be prepared for any line of code that uses this union/struct to fail and have to be replaced in the future. folks will argue with me on this point but some folks never see this in their career others that do see it it often takes 10+ years for varous reasons, but when it happens it is devastating to the product. YMMV. it adds no value at all to writing the code you can trivially make easy to read/use code using very safe solutions. – old_timer Jan 06 '22 at 14:02
  • and yes, most chip libraries are falling into this fad, but if you look, most mcu chip libraries are very very scary, not code you want to bet the success of your product on. remember they make a lot of money through support contracts, and there are legal disclaimers for the libraries making their code not their problem but yours. – old_timer Jan 06 '22 at 14:03
  • if you want to take this approach you either have to disable optimization, making for bloated/slow code or you need to use volatile, you use volatile and it generates a lot of code. there are other ways to do this, most folks will use a volatile pointer solution to keep it C language and that will both help and hurt the overall output. If you dont do either then because you are crossing compile domains (hardware is basically a separate domain) there will be times that what you are thinking you are asking the hardware to di will not happen or will be out of order or at a different time – old_timer Jan 06 '22 at 14:05
  • also volatile does not do what folks think it does there are numerous debates online about this. it is supposed to be meant to allow for hardware register access (ideally a simple uint32_t type pointer not a struct or union), but llvm/clang and gcc disagree on what it means as you can see sometimes in the output of their code. end of the day need to take all of these things into account IMO. but that is my opinion based on experience and failures in these solutions, you will find many other opinions on these topics – old_timer Jan 06 '22 at 14:14

3 Answers3

5

I think this is a bug in GCC. Per discussion below, you might consider using:

f_ptr->value =  (union foo) {
        .x = 1,
        .y = 7,
        .z = 10,
    } .value;

By the C standard, the code a compiler generates for a program may not access a volatile object when the original C code nominally does not access the object. The code *f_ptr = (union foo) { .x = 1, .y = 7, .z = 10, }; is a single assignment to *f_ptr. So we would expect this to generate a single store to *f_ptr; generating two stores is a violation of the standard’s requirements.

We could consider an explanation for this to be that GCC is treating the aggregate (the union and/or the structure within it) as several objects, each individually volatile, rather than one aggregated volatile object.1 But, if this were so, then it ought to generate separate 16-bit strh instructions for the parts (per the original example code, which had 16-bit parts), not the 32-bit str instructions we see.

While using a local variable appears to work around the issue, I would not rely on that, because the assignment of the compound literal above is semantically equivalent, so the cause of why GCC generates broken assembly code for one sequence of code and not the other is unclear. With different circumstances (such as additional or modified code in the function or other variations that might affect optimization), GCC might generate broken code with the local variable too.

What I would do is avoid using an aggregate for the volatile object. The hardware register is, presumably, physically more like a 32-bit unsigned integer than like a structure of bit-fields (even though semantically it is defined with bit-fields). So I would define the register as volatile uint32_t and use that type when assigning values to it. Those values could be prepared with bit shifts or structures with bit-fields or whatever other method you prefer.

It should not be necessary to avoid using local variables, as the optimizer should effectively eliminate them. However, if you wish to neither change the register definition nor use local variables, an alternative is the code I opened with:

f_ptr->value =  (union foo) {
        .x = 1,
        .y = 7,
        .z = 10,
    } .value;

That prepares the value to be stored but then assigns it using the uint32_t member of the union rather than using the whole union, and testing with ARM GCC 4.6.4 on Compiler Explorer (the closest match I could find on Compiler Explorer to what you are using) suggests it generates a single store with minimal code:

main:
        ldr     r3, .L2
        mov     r2, #175
        str     r2, [r3, #0]
        mov     r0, #0
        bx      lr
.L2:
        .word   .LANCHOR0
.LANCHOR0 = . + 0
f:

Footnote

1 I would consider this to a bug too, as the C standard does not make provision for applying the volatile qualifier on a union or structure declaration as applying to the members rather than to the whole aggregate. For arrays, it does say that qualifiers apply to the elements, rather than the whole array (C 2018 6.7.3 10). It has no such wording for unions or structures.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Do you means that compiler treat the bit-field members of struct as different objects to access rather than an object of a whole struct – old cat Jan 06 '22 at 13:02
  • Does C formally define exactly how struct assignment works? In C++, it's each element separately, for example. (But C++ entirely refuses to copy to or from volatile structs: https://godbolt.org/z/Mz1qYb4eK unless you manually define a copy-constructor, and then you're not getting the compiler to copy the whole object: [volatile struct = struct not possible, why?](https://stackoverflow.com/q/49367852)). So probably C++ is irrelevant to this discussion about C, but GCC does also compile C++ with a separate front-end; IDK what the C front-end hands the back-end optimizer for struct copies. – Peter Cordes Jan 06 '22 at 13:06
  • Interesting to note that if we also make one of the struct *members* `volatile`, for a struct of 3 int members, that doesn't have any effect on the generated asm, https://godbolt.org/z/c81adEj93 e.g. using a qword store to update both the volatile member and another member in one instruction, on x86-64. (But again I think this is an unrelated tangent to the question about the whole struct being a volatile object in C.) – Peter Cordes Jan 06 '22 at 13:13
  • 2
    @PeterCordes: I think the C standard is lacking in formal semantics for this. But, per my footnote, even if the C standard or GCC did consider structure assignment to be member-by-member, then the compiler should have generated `strh` instructions for the 16-bit members. But it did not; it generated multiple 32-bit `str` instructions, causing all parts of the destination to be written multiple times. – Eric Postpischil Jan 06 '22 at 13:13
3

There seems to be no need for bitfields in your program: using uint16_t types should make it simpler and generate better code:

#include <stdint.h>

union foo {
    uint32_t value;
    struct {
        uint16_t x;
        uint16_t y;
    };
};
extern union foo f;

int main() {
    volatile union foo *f_ptr = &f;
    *f_ptr = (union foo) {
        .x = 10,
        .y = 20,
    };
    return 0;
}

Code generated by arm gcc 4.6.4 linux, as produced by Godbolt Compiler Explorer:

main:
        ldr     r3, .L2
        mov     r0, #0
        mov     r2, #10
        str     r0, [r3, #0]
        strh    r2, [r3, #0]    @ movhi
        mov     r2, #20
        strh    r2, [r3, #2]    @ movhi
        bx      lr
.L2:
        .word   f

The code is much simpler but still performs a redundant store for the 32-bit value: str r0, [r3, #0] when storing the union in one shot.

Investigating this, I tried different approaches and got surprising results: using struct or union assignment generates code potentially inappropriate for memory mapped hardware register, storing the fields element-wise seems required to generate the proper code:

#include <stdint.h>

union foo {
    uint32_t value;
    struct {
        uint16_t x;
        uint16_t y;
    };
};
extern union foo f;

void store_1(void) {
    volatile union foo *f_ptr = &f;
    *f_ptr = (union foo) {
        .x = 10,
        .y = 20,
    };
}
void store_2(void) {
    volatile union foo *f_ptr = &f;
    union foo bar = { .x = 10, .y = 20, };
    *f_ptr = bar;
}
void store_3(void) {
    volatile union foo *f_ptr = &f;
    f_ptr->x = 10;
    f_ptr->y = 20;
}
int main() {
    return 0;
}

Furthermore, removing the uint32_t value; generates calls to memcpy for the struct assignment version.

Code generated by arm gcc 4.6.4 linux:

store_1:
        ldr     r3, .L2
        mov     r2, #0
        str     r2, [r3, #0]
        mov     r2, #10
        strh    r2, [r3, #0]    @ movhi
        mov     r2, #20
        strh    r2, [r3, #2]    @ movhi
        bx      lr
.L2:
        .word   f
store_2:
        ldr     r3, .L5
        ldr     r2, .L5+4
        str     r2, [r3, #0]
        bx      lr
.L5:
        .word   f
        .word   1310730
store_3:
        ldr     r3, .L8
        mov     r2, #10
        strh    r2, [r3, #0]    @ movhi
        mov     r2, #20
        strh    r2, [r3, #2]    @ movhi
        bx      lr
.L8:
        .word   f
main:
        mov     r0, #0
        bx      lr

Further investigations seems to link the issue to the use of volatile union foo *f_ptr = &f; instead of tagging the union members as volatile:

#include <stdint.h>

union foo {
    uint32_t value;
    struct {
        volatile uint16_t x;
        volatile uint16_t y;
    };
};
extern union foo f;

void store_1(void) {
    union foo *f_ptr = &f;
    *f_ptr = (union foo) {
        .x = 10,
        .y = 20,
    };
    *f_ptr = (union foo) {
        .x = 10,
        .y = 20,
    };
}
void store_2(void) {
    union foo *f_ptr = &f;
    union foo bar = { .x = 10, .y = 20, };
    *f_ptr = bar;
    *f_ptr = bar;
}
void store_3(void) {
    union foo *f_ptr = &f;
    f_ptr->x = 10;
    f_ptr->y = 20;
    f_ptr->x = 10;
    f_ptr->y = 20;
}

Code generated:

store_1:
        ldr     r3, .L2
        mov     r1, #10
        mov     r2, #20
        strh    r1, [r3, #0]    @ movhi
        strh    r2, [r3, #2]    @ movhi
        strh    r1, [r3, #0]    @ movhi
        strh    r2, [r3, #2]    @ movhi
        bx      lr
.L2:
        .word   f
store_2:
        ldr     r3, .L5
        ldr     r2, .L5+4
        str     r2, [r3, #0]
        bx      lr
.L5:
        .word   f
        .word   1310730
store_3:
        ldr     r3, .L8
        mov     r1, #10
        mov     r2, #20
        strh    r1, [r3, #0]    @ movhi
        strh    r2, [r3, #2]    @ movhi
        strh    r1, [r3, #0]    @ movhi
        strh    r2, [r3, #2]    @ movhi
        bx      lr
.L8:
        .word   f

As you can see, assigning the union does not generate appropriate code in store_2, even when qualifying the value member as volatile too.

Using the C99 compound literals seems to work correctly in store_1, generating redundant stores when the fields are qualified as volatile.

Yet I would recommend assigning the fields explicitly as in store_3, making the assignment order explicit too. If instead you want to generate a single 32-bit store, assuming it is correct for your hardware, Aki Suihkonen suggested an interesting approach.

The original problem is a side effect of how the compiler generates code for assigning compound literals to structures and unions: it first initializes the destination to all bits zero, then stores the members specified in the compound literal explicitly. Redundant stores are eliminated unless the destination is volatile qualified`. I don't believe this behavior is mandated by the C Standard, so it may well be compiler specific.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • That assembly has multiple stores to the target location, which OP seeks to avoid. – Eric Postpischil Jan 06 '22 at 11:05
  • @EricPostpischil: yes, both `store_1` and `store_2` generate inappropriate code, `store_3` seems fine. – chqrlie Jan 06 '22 at 11:07
  • *using struct or union assignment generates code inappropriate for memory mapped hardware register* Do you mean that `store_2` generates this instruction `str r2, [r3, #0]` which is not correct - should be `str r2, [r3, #2]`? – kiner_shah Jan 06 '22 at 11:23
  • 2
    @kiner_shah: assigning the fields `x` and `y` with a single store may be inappropriate, depending on hardware specific behavior. 2 separate stores seem safer. Note also the problem in `store_2` for the third case, where the `volatile` tag on the field members is ignored, even if all members are tagged as `volatile`. – chqrlie Jan 06 '22 at 11:25
  • @chqrlie My sample code is bad, the bit-field size may be arbitrary (e.g. uint32_t x : 1, uint32_t y : 31 ) – old cat Jan 06 '22 at 12:30
  • @oldcat: if you can assign the 32-bit value in one store, use the solution proposed by *Aki Suihkonen*. If it does not generate the proper code, you can define an inline function that uses a local variable and assigns the 32-bit value as `f_ptr->value = bar.value;` – chqrlie Jan 06 '22 at 13:23
3

You can force the aggregate union to be written in one go with

f_ptr->value = (union foo) {
    .x = 10,
    .y = 20,
}.value;

// produced asm
mov     r1, #10
orr     r1, r1, #1310720
str     r1, [r0]
bx      lr
Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57