0

There have previously been some great answers on memory alignment, but I feel don't completely answer some questions.

E.g.:

What is data alignment? Why and when should I be worried when typecasting pointers in C?

What is aligned memory allocation?

I have an example program:

#include <iostream>
#include <vector>
#include <cstring>

int32_t cast_1(int offset) {
  std::vector<char> x = {1,2,3,4,5};
  return reinterpret_cast<int32_t*>(x.data()+offset)[0];
}

int32_t cast_2(int offset) {
  std::vector<char> x = {1,2,3,4,5};
  int32_t y;
  std::memcpy(reinterpret_cast<char*>(&y), x.data() + offset, 4);
  return y;
}

int main() {
  std::cout << cast_1(1) << std::endl;
  std::cout << cast_2(1) << std::endl;
  return 0;
}

The cast_1 function outputs a ubsan alignment error (as expected) but cast_2 does not. However, cast_2 looks much less readable to me (requires 3 lines). cast_1 looks perfectly clear on the intent, even though it is UB.

Questions:

1) Why is cast_1 UB, when the intent is perfectly clear? I understand that there may be performance issues with alignment.

2) Is cast_2 a correct approach to fixing the UB of cast_1?

thc
  • 9,527
  • 1
  • 24
  • 39

2 Answers2

3

1) Why is cast_1 UB?

Because the language rules say so. Multiple rules in fact.

  1. The offset where you access the object does not meet the alignment requirements of int32_t (except on systems where the alignment requirement is 1). No objects can be created without conforming to the alignment requirement of the type.

  2. A char pointer may not be aliased by a int32_t pointer.

2) Is cast_2 a correct approach to fixing the UB of cast_1?

cast_2 has well defined behaviour. The reinterpret_cast in that function is redundant, and it is bad to use magic constants (use sizeof).

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • For `cast_2`, is `std::memcpy(&y, x.data() + offset, sizeof(y));` better? For the 1st question, is there really no better answer than "the rules say so"? – thc Feb 09 '19 at 22:19
  • Well, the reason that anything is UB is just that the standard says so. That's the definition. I described the motivation in my answer. And yes, that code looks good. – Useless Feb 09 '19 at 22:28
  • @thc It's unclear to me what you mean by "better answer". Yes, using `sizeof` would be better, and would guarantee correctness on systems where the size is not 4. As an alternative to second operand, you could use `&x[offset]`, but that's up to your own preference. – eerorika Feb 09 '19 at 22:41
  • Thanks for both your comments! – thc Feb 09 '19 at 23:08
1

WRT the first question, it would be trivial for the compiler to handle that for you, true. All it would have to do is pessimize every other non-char load in the program.

The alignment rules were written precisely so the compiler can generate code that performs well on the many platforms where aligned memory access is a fast native op, and misaligned access is the equivalent of your memcpy. Except where it could prove alignment, the compiler would have to handle every load the slow & safe way.

Useless
  • 64,155
  • 6
  • 88
  • 132