Function for unaligned memory access on ARM

Question

I am working on a project where data is read from memory. Some of this data are integers, and there was a problem accessing them at unaligned addresses. My idea would be to use memcpy for that, i.e.

uint32_t readU32(const void* ptr)
{
    uint32_t n;
    memcpy(&n, ptr, sizeof(n));
    return n;
}

The solution from the project source I found is similar to this code:

uint32_t readU32(const uint32_t* ptr)
{
    union {
        uint32_t n;
        char data[4];
    } tmp;
    const char* cp=(const char*)ptr;
    tmp.data[0] = *cp++;
    tmp.data[1] = *cp++;
    tmp.data[2] = *cp++;
    tmp.data[3] = *cp;
    return tmp.n;
}

So my questions:

Isn't the second version undefined behaviour? The C standard says in 6.2.3.2 Pointers, at 7:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned 57) for the pointed-to type, the behavior is undefined.

As the calling code has, at some point, used a char* to handle the memory, there must be some conversion from char* to uint32_t*. Isn't the result of that undefined behaviour, then, if the uint32_t* is not corrently aligned? And if it is, there is no point for the function as you could write *(uint32_t*) to fetch the memory. Additionally, I think I read somewhere that the compiler may expect an int* to be aligned correctly and any unaligned int* would mean undefined behaviour as well, so the generated code for this function might make some shortcuts because it may expect the function argument to be aligned properly.

The original code has volatile on the argument and all variables because the memory contents could change (it's a data buffer (no registers) inside a driver). Maybe that's why it does not use memcpy since it won't work on volatile data. But, in which world would that make sense? If the underlying data can change at any time, all bets are off. The data could even change between those byte copy operations. So you would have to have some kind of mutex to synchronize access to this data. But if you have such a synchronization, why would you need volatile?
Is there a canonical/accepted/better solution to this memory access problem? After some searching I come to the conclusion that you need a mutex and do not need volatile and can use memcpy.

P.S.:

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 1581.05
Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

The union insures sufficient alignment for all types contained within it. `memcpy` is the usual method these days however it will not work with volatile accesses and sometimes you have to feed the optimizer the version it is most comfortable with. As for `volatile` it can in itself serve as a synchronization primitive, such as reading back the volatile results _after_ writing a volatile I/O register, but other mechanisms such as compiler barriers are often used. In addition the `volatile` qualifier forces byte-wise and consecutive access, which may be important for I/O. — doynax, Aug 04 '17 at 12:00
For the second version, it's the source pointer that isn't correctly aligned, rather than the resulting pointer, but the act of calling the function is UB. — Ian Abbott, Aug 04 '17 at 12:06
Your own (first) version looks good, especially if the compiler does the `memcpy` inline. — Ian Abbott, Aug 04 '17 at 12:07
In particular what you need to keep in mind here is that I/O registers are not normal memory buffers and tend to have side-effects and limitations for which you need to carefully force particular desired access patterns. For instance reading a status register may acknowledge an event, only 32-bit accesses may be supported, and careless read-modify-write cycles to flip a bit may induces races against hardware as well as software. So take care and read the datasheet carefully. — doynax, Aug 04 '17 at 12:07
Another thing to consider for portability is byte-ordering. For example, if the 32-bit unsigned integer value is embedded in a message that has a prescribed byte order for multi-byte integers, then you should extract the value in an endian-portable fashion. — Ian Abbott, Aug 04 '17 at 12:15
@hochl - what ARM. From ARMv6 there is no need for the aligned access. — 0___________, Aug 04 '17 at 12:18
@Ian Aboot: I mean the `const uint32_t*` argument is UB, yes, so I think the function call is UB already. @PeterJ: `/proc/cpuinfo` says ARMv7 revision 10 (v7l), but readelf says for `File Attributes` `Tag_ABI_align_needed: 8-byte` and `Tag_CPU_unaligned_access: v6`. — hochl, Aug 04 '17 at 12:28
So it is run under linux. My comment was about the bare metal — 0___________, Aug 04 '17 at 12:29
Read [this question on Unaligned ARM Linux accesses](https://stackoverflow.com/questions/16548059/how-to-trap-unaligned-memory-access). It maybe helpful. Often what you show is a bandaid. It is better to investigate why callers are passing around data like this and possibly fix it for all platforms. Even if a platform support unaligned accesses, it is most likely slower. — artless noise, Aug 08 '17 at 02:53
@PeterJ_01: Individual word and halfword loads and stores may not require aligned access, but given e.g. `struct {int x,y;} *foo,*bar;` a compiler may use some `LDRD/STRD/LDRM/STRM` to process something like `*foo = *bar;`, or even something like `x=foo->x; y=foo->y;`. It is in general not safe to use unaligned operations when using default compiler options, unless one declares the pointers in question `volatile`, because there's no way of knowing what operations a compiler might try to consolidate. — supercat, Aug 24 '17 at 15:33

score 2 · Accepted Answer · answered Aug 04 '17 at 12:12

This code

uint32_t readU32(const uint32_t* ptr)
{
    union {
        uint32_t n;
        char data[4];
    } tmp;
    const char* cp=(const char*)ptr;
    tmp.data[0] = *cp++;
    tmp.data[1] = *cp++;
    tmp.data[2] = *cp++;
    tmp.data[3] = *cp;
    return tmp.n;
}

passes the pointer as a uint32_t *. If it's not actually a uint32_t, that's UB. The argument should probably be a const void *.

The use of a const char * in the conversion itself is not undefined behavior. Per 6.3.2.3 Pointers, paragraph 7 of the C Standard (emphasis mine):

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

The use of volatile with respect to the correct way to access memory/registers directly on your particular hardware would have no canonical/accepted/best solution. Any solution for that would be specific to your system and beyond the scope of standard C.

it is not an abstract implementation. It is the ARM gcc where "The Standard" UBs are very well defined. Your answer is too theoretical. — 0___________, Aug 04 '17 at 12:26
@PeterJ True, but the OP asked for a "canonical/accepted/better" solution than the already-working one he has. Outside of the function being called with a `uint32_t *` instead of a `void *`, there isn't one, is there? And even that UB usage of `uint32_t *` "works" on the specific platform, apparently. — Andrew Henle, Aug 04 '17 at 12:45
OP forgot to mention that it not a bare metal system and he writes the program which will be executed under linux. My deliberations are about the bare metal solutions were better does not mean portable. x86 system - it is very unlikely to have a bare metal system - unless you program MBs bioses. ARM - it is not as obvious :) — 0___________, Aug 04 '17 at 12:58
@O: Historically, implementations intended to be suitable for low-level programming would process constructs upon which "the Standard imposes no requirements", but where the target environment documented a characteristic behavior, "in a documented fashion characteristic of the environment", but clang and gcc interpret "the Standard imposes no requirements" as an invitation to behave gratuitously nonsensically, even on platforms where a useful characteristic behavior exists. — supercat, Feb 09 '23 at 19:10

score 1 · Answer 2 · answered Aug 21 '17 at 17:21

Implementations are allowed to define behaviors in cases where the Standard does not, and some implementations may specify that all pointer types have the same representation and may be freely cast among each other regardless of alignment, provided that pointers which are actually used to access things are suitably aligned.

Unfortunately, because some obtuse compilers compel the use of "memcpy" as an escape valve for aliasing issues even when pointers are known to be aligned, the only way compilers can efficiently process code which needs to make type-agnostic accesses to aligned storage is to assume that any pointer of a type requiring alignment will always be aligned suitably for such type. As a result, your instinct that approach using uint32_t* is dangerous is spot on. It may be desirable to have compile-time checking to ensure that a function is either passed a void* or a uint32_t*, and not something like a uint16_t* or a double*, but there's no way to declare a function that way without allowing a compiler to "optimize" the function by consolidating the byte accesses into a 32-bit load that will fail if the pointer isn't aligned.

Function for unaligned memory access on ARM

2 Answers2