0

I have inherited a very ancient (start of 2000s) Code base that consists of a driver and a kernel module and it crashes for some combinations of compiler flags/architectures. At some point an offset is incorrect for an unknown reason and when I skimmed the code base I found a lot of unaligned access and wanted to clean up this part to remove these potential issues. Example:

uint32_t* p = ....; // <-- odd address on right side.
uint32_t b = *p + 1;

I know that current processors handle such cases without noticeable delays, and I had problems finding hardware that would trigger a BUS error (I once had a RPi 1 that would produce bus errors when setting /proc/cpu/alignment to 4). I wanted to make sure the compiler generates code that correctly access the memory. My first draft for reading a 32 bit integer and converting it from network to host byte order (there was already a macro for that) for an unaligned address was thus (dest and src are pointers to some memory):

#define SWAP32(dest, src) \
    do { \
        uint32_t tmp_ = 0; \
        memcpy(&tmp_, (const char*)(src), sizeof(tmp_)); \
        tmp_ = bswap_32(tmp_); \
        memcpy((char*)(dest), &tmp_, sizeof(tmp_)); \
    } while(0)

I then wanted to create a macro to wrap pointer accesses for other cases that were not yet wrapped in a macro and came up with the following solution (sample contains code to make it compile):

#include <stdint.h>
#include <stdio.h>

struct __attribute__((packed)) unaligned_fix_32 { uint32_t n; };

#define GET32(x)    ((struct unaligned_fix_32*)(x))->n

int main(int argc, char* argv[])
{
    char mem[] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 };
    uint32_t n = GET32(&mem[1]);
    printf("0x%8.8x\n", n);
}

My goal is to wrap pointer accesses inside this macro everywhere. The idea is that, knowing the struct is packed, the compiler will generate code that accesses memory in a way that avoids a potential bus error on architectures that cannot access unaligned memory.

EDIT: I changed my code to the following new version:

static inline uint32_t get_be_unaligned_u32(const void* p)
{
    uint32_t u32;
    memcpy(&u32, (const char*)p, sizeof(u32));
    return bswap_32(u32);
}
#define GET32(src)   get_be_unaligned_u32(src)

Please answer questions also for the new version:

  1. Is this a workable solution on real hardware? Can anyone check this? Using -fsanitize=alignment produces the proper exceptions without the attribute packed, but the generated assembler simply checks addresses with & 0x3 and raises a trap if it is not zero.
  2. Is this undefined behavior?
  3. Are there better/other methods to detect/fix/simulate unaligned access inside a C/C++ program?
  4. Can I somehow do this using QEmu?
  5. Is the SWAP32 macro UB?
hochl
  • 12,524
  • 10
  • 53
  • 87
  • `((struct unaligned_fix_32*)(x))->n` violates C++'s strict aliasing rules putting you in UB Land. AFAIK x86/64 systems will handle unaligned access just fine but ARM systems can/will crash on unaligned access. – NathanOliver Oct 25 '22 at 16:15
  • 2
    Why is the compiler generating code with unaligned accesses on hardware that doesn't support it? – Pete Becker Oct 25 '22 at 16:26
  • 1
    @NathanOliver: It seems recent 64 bit ARM just handles unaligned access fine, too. – hochl Oct 25 '22 at 16:38
  • Many processors can handle unaligned fetches. The issue is that an unaligned quantity takes one or more extra fetches (except those that are double-word wide). Aligned quantities are more efficient to fetch. – Thomas Matthews Oct 25 '22 at 16:43
  • Wouldn't `class AlignedAccessor { uint32_t* m_p; public: AlignedAccessor(uint32_t* p) : m_p(p) { assert((reinterpret_cast(p) & 0x3) == 0); } ... implementation of operators... };` do the trick? You may even be able to attach additional info about the call site by using a defaulted parameter of type [`std::source_location`](https://en.cppreference.com/w/cpp/utility/source_location) which you could print out to stderr in debug configuration... – fabian Oct 25 '22 at 16:58
  • Do not tag both C and C++ except when asking about differences or interactions between the two languages. Pick one language and delete the other tag. – Eric Postpischil Oct 25 '22 at 17:17
  • @ThomasMatthews: Following https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality (check out some of the comments) this makes no difference these days. The post is from 2012, so the situation surely hasn't become worse. – hochl Oct 25 '22 at 18:58
  • @hochl: According to the article, the data in the tables shows that aligned fetches are faster. My point is that extra work is required for unaligned data. Not only are multiple fetches required, the processor has to do some bit manipulation to get the data into the proper place in its word/register. This takes time. Whether the time savings is significant or not is an opinion. In time critical, embedded applications, there are times when this is significant (such as the accuracy of a 1ms ISR or reaching critical events). So, unaligned is slower by his data. – Thomas Matthews Oct 25 '22 at 19:48
  • Do you have a target that crashes on unaligned accesses? If not, why do you spend time and effort in this? This feels like premature effort, nearly as useless as premature optimization. If automatic instrumentation already reveals the problem during testing, why invest all this? – the busybee Oct 25 '22 at 20:46
  • @hochl *this makes no difference these days* [Absolutely **not** true](https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment/46790815#46790815). If you want to play with fire by writing code to low standards and invoking undefined behavior so that your app is one hair away from crashing, that's on you. But to claim it "makes no difference" when you invoke UB is flat-out wrong. Because you're just **hoping** it will never "make a difference". – Andrew Henle Oct 25 '22 at 21:43
  • @AndrewHenle: "Makes no difference" was meant to the speed increase. I'm well aware about all the implications of UB and alignment (and that makes a difference, for the bus errors alone), that is why I started this post in the first place. I added further details to the question. – hochl Oct 26 '22 at 09:12

0 Answers0