4

There are a lot of questions about accessing unallocated memory, which is clearly undefined behavior. But what about the following corner case.

Consider the following struct, which is aligned to 16 bytes, but occupies only 8 bytes from that:

struct alignas(16) A
{
    float data[2]; // the remaining 8 bytes are unallocated
};

Now we access 16 bytes of data by SSE aligned load / store intrinsics:

__m128 test_load(const A &a)
{
    return _mm_load_ps(a.data);
}

void test_store(A &a, __m128 v)
{
    _mm_store_ps(a.data, v);
}

Is this also undefined behavior and should I use padding instead?

Anyway, since Intel intrinsics are not standard C++, is accessing a partly allocated but aligned memory block (not greater than the size of the alignment) undefined behavior in standard C++?

I address both the intrinsic case and standard C++ case. I'm interested in both of them.

plasmacel
  • 8,183
  • 7
  • 53
  • 101
  • So the struct is guaranteed to be *aligned* to 16 bytes, not *allocated* 16 bytes. You could make a struct of a single character and align it 1k, but you wouldn't assume the next 1024 - sizeof(char) bytes were yours as well. If alignment were synonymous with allocation, you wouldn't be able to align(16) a struct of a hundred floats because a hundred floats is larger than 16 bytes. – Mr. Llama Dec 14 '16 at 22:48
  • 2
    See also http://stackoverflow.com/questions/37800739/is-it-safe-to-read-past-the-end-of-a-buffer-within-the-same-page-on-x86-and-x64. It's UB according to the ISO C++ standard, but I think read-only access like this does work safely on implementations that provide Intel's intrinsics (which are free to define whatever extra behaviour they want). It's definitely safe in asm, but the risk is that optimizing C++ compilers that turn UB into mis-compiled code might cause a problem if they can prove that there's nothing there to read. There's some discussion of that on the linked question. – Peter Cordes Dec 15 '16 at 03:48
  • 2
    Writing outside of objects is always bad. Don't do it, not even if you put back the same garbage you read earlier: A non-atomic load/store pair can be a problem depending on what's next. – Peter Cordes Dec 15 '16 at 03:51
  • @PeterCordes Do you want to rewrite your comments as an answer? I think they address the question in more detail than the accepted one. – plasmacel Dec 15 '16 at 09:32
  • 1
    I was thinking this was a possible duplicate, so was holding off on posting an answer. It's not, because this question asks about writing. – Peter Cordes Dec 15 '16 at 18:52

2 Answers2

3

See also Is it safe to read past the end of a buffer within the same page on x86 and x64? The reading part of this question is basically a duplicate of that.

It's UB according to the ISO C++ standard, but I think read-only access like this does work safely (i.e. compile to the asm that you'd expect) on implementations that provide Intel's intrinsics (which are free to define whatever extra behaviour they want). It's definitely safe in asm, but the risk is that optimizing C++ compilers that turn UB into mis-compiled code might cause a problem if they can prove that there's nothing there to read. There's some discussion of that on the linked question.


Writing outside of objects is always bad. Don't do it, not even if you put back the same garbage you read earlier: A non-atomic load/store pair can be a problem depending on what data follows your struct.

The only time this is ok is in an array where you know what comes next, and that there is unused padding. e.g. writing out an array of 3-float structs using 16B stores overlapping by 4B. (Without alignas for over-alignment, so an array packs them together without padding).


A struct of 3 floats would be a much better example than 2 floats.

For this specific example (of 2 floats) you can just use MOVSD to do a 64-bit zero-extending load, and MOVSD or MOVLPS to do a 64-bit store of the low half of an __m128.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

A language-lawyer answer to this is 'the question is moot'. _mm_load_ps is not defined in standard, and it is using ASM instruction which is not defined in standard either. C++ does not deal with this.

As for your second question - accessing an unallocated memory from C++ this way is clearly undefined behavior. No object was placed in this memory, thus you can't access it.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • That's why I address both the intrinsic case AND the standard case. I'm interested in both. – plasmacel Dec 14 '16 at 22:25
  • @plasmacel, answered it as well. – SergeyA Dec 14 '16 at 22:29
  • However the memory is reserved by the alignment. The unallocated values are clearly undefined, but are you sure the behavior is also undefined? – plasmacel Dec 14 '16 at 22:35
  • 2
    @plasmacel: Nothing guarantees that alignment bytes are padded out. For instance: `struct B { A a; char data[2]; }` `B.data` would live in what you think is allocated for you. – Guvante Dec 14 '16 at 22:39
  • 1
    @plasmacel: right, so you **can safely read it** and ignore the high garbage, but you can't safely write it. (Of course, this specific example is moot, since you can just use MOVSD to do a 64-bit zero-extending load, and MOVSD or MOVLPS to do a 64-bit store of the low half of an `__m128`.) – Peter Cordes Dec 15 '16 at 03:44
  • 1
    The question is whether it's UB on an implementation that supports Intel's intrinsics, including any implementation-defined behaviour. Obviously we're only interested in the behaviour of this code with compilers that can compile it. – Peter Cordes Dec 15 '16 at 03:53