1

For a hardware driver, I need a method to write a C++ struct of 8 x uint32_t to a block of 8 successive 32-bit hardware registers. Here is my code:

#include "stdint.h"

typedef struct
{
    uint32_t a[8];
} TCmd;

class MyClass {
    public:
        void writeCommandBlock( TCmd* ap_src)
        
        {
             TCmd * p_dest = reinterpret_cast< TCmd*>(0x40080000); // base register address
             *p_dest = *ap_src;
        }
};

int main()
{
    MyClass m;

    TCmd cmd;
    cmd.a[0] = 12;

    m.writeCommandBlock(&cmd);
}

(The actual struct will have bit-fields and the register address will be defined by a constant).

I want to use a simple assignment operator (as shown above):

*p_dest = *ap_src;

because I am working with an ARM Cortex M4 processor and want to use the ldm/stm instructions, yielding something like:

        ldr     r1, [sp, #4]
        ldr     r0, [sp]
        ldm     r1!, {r2, r3, r12, lr}
        stm     r0!, {r2, r3, r12, lr}
        ldm     r1, {r2, r3, r12, lr}
        stm     r0, {r2, r3, r12, lr}

which should be faster than memcpy etc.

The problem comes when I enable optimisation - the assignment is optimised away. So I declare p_dest as volatile:

volatile TCmd * p_dest = reinterpret_cast< TCmd*>(0x40080000);

but the assignment line then gives error:

<source>:16:21: error: no viable overloaded '='
            *p_dest = *ap_src;
            ~~~~~~~ ^ ~~~~~~~
<source>:3:9: note: candidate function (the implicit copy assignment operator) not viable: 'this' argument has type 'volatile TCmd', but method is not marked volatile
typedef struct
        ^
<source>:3:9: note: candidate function (the implicit move assignment operator) not viable: 'this' argument has type 'volatile TCmd', but method is not marked volatile
1 error generated.
Compiler returned: 1

How can I fix this please?

DavidA
  • 2,053
  • 6
  • 30
  • 54
  • The error suggests a solution: Mark `writeCommandBlock` as `volatile`. Have you tried that? You just change `void writeCommandBlock( TCmd* ap_src)` to `void writeCommandBlock( TCmd* ap_src) volatile` (just like you'd prepend `const` for `const` methods). I'm a little confused as to why a class is involved here though; it has no state, no useful actions to construct/destruct it, etc. Why not just a plain function? – ShadowRanger Jun 22 '22 at 14:16
  • Why was the assignment "optimised away"? Are you sure about that? I can't reproduce that. – Wyck Jun 22 '22 at 14:20
  • @ShadowRanger Thanks, but I did try adding volatile as you suggested but the same error remains. Class is overkill here but my real code is more complex. – DavidA Jun 22 '22 at 14:23
  • Does marking the *pointer* as volatile (instead of the target) help? Like this: `TCmd* volatile p_dest = reinterpret_cast(0x40080000);`. – Adrian Mole Jun 22 '22 at 14:24
  • @AdrianMole Why should he mark the pointer as volatile? The struct should be volatile. – BЈовић Jun 22 '22 at 14:26
  • 3
    @Wyck: With high enough optimization levels, most compilers that can determine an assignment occurs that is never read from later can be omitted. It doesn't really matter if one compiler doesn't do it, you need `volatile` to guarantee the writes are actually performed. `volatile` was *created* to handle this exact case, writing to DMA hardware where the writes themselves are instructions or data being fed to the hardware, and the reads, if any, may read back unrelated data (you might write to the same address 100 times, then read 100 times, and you don't want writes skipped or reads cached). – ShadowRanger Jun 22 '22 at 14:28
  • @Wyck I'm testing the result in 'Compiler Explorer' at https://godbolt.org/ . The assignment certainly seems to be optimised away at -O1 and higher. Unfortunately I don't know how to share the code there. – DavidA Jun 22 '22 at 14:33
  • Find "share" at the right side of the godbolt screen. – n. m. could be an AI Jun 22 '22 at 15:18
  • https://stackoverflow.com/questions/4644296/cant-assign-an-object-to-a-volatile-object – n. m. could be an AI Jun 22 '22 at 15:33

1 Answers1

0

I'm assuming you don't want TCmd to be volatile so constructing it is fast and only the final commit to the hardware registers should be volatile.

How about this:

#include <cstdint>
#include <array>
#include <algorithm>

using TCmd = std::array<uint32_t, 8>;

class MyClass {
    public:
        void writeCommandBlock(const TCmd & ap_src) 
        {
             volatile uint32_t * p_dest = reinterpret_cast<volatile uint32_t *>(0x40080000); // base register address
             std::copy(&ap_src[0], &ap_src[ap_src.size()], p_dest);
        }
};

int main()
{
    MyClass m;

    TCmd cmd;
    cmd[0] = 12;

    m.writeCommandBlock(cmd);
}

Note: With gcc and -O2 this turns into a loop, with -O3 is unrolls into 8 loads and stores. But the loops seems to be the preferable code.

While a ldm opcode could be preferable I believe stm on MMIO registers is undefined. At least on something like the RaspberryPI the device interface is horribly fragile and likely to do the wrong thing.

You should probably also add a memory barrier after the write to ensure it actually happens and completes.

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • Thanks for this. I'm particularly interested in your comment about stm and MMIO registers. I can't actually use std::array because our command is actually a struct of bit-fields. – DavidA Jun 22 '22 at 16:17
  • `std::tuple` then. For stm on MMIO registers you have to check the hardware specs for your device. MMIO is special and doesn't necessarily support everything that normal memory supports. – Goswin von Brederlow Jun 22 '22 at 17:32
  • I've tried std::tuple but can't get the placement new / assignment to compile. https://godbolt.org/z/eMqWG379e – DavidA Jun 23 '22 at 11:09
  • First if you want to use a tuple to store into MMIO regs then the regs need to be volatile. Then you can call the constructor with palcement new to initialize the object. C++ adds construct_at for this. And you could create a temporary in memory and assign, but that then initializes the volatile temp and copies it to the registers. Really inefficient. If you want to copy instead of construct then stick with the copying loop. https://godbolt.org/z/ah3145hcx Note: the `pause` is there so you see what each statement produces as code. – Goswin von Brederlow Jun 23 '22 at 11:33