1

So, there are a few questions on SO about this subject, but I haven't quite found something that exactly answers the question I have in mind. First some background:

I would like to have a uint32_t field, which I can also access as an array of bytes.

So the first thing that comes to mind is:

union U {
    uint32_t u32;
    uint8_t bytes[sizeof(uint32_t)];
};

Which allows me to do this:

// "works", but is UB as far as I understand
U u;
u.u32 = 0x11223344;
u.bytes[0] = 0x55;

OK, so undefined behavior (UB) is bad, therefore we don't want to do that. Similarly casts are UB and can sometimes be even worse due to alignment concerns (though not in this case because I'm using a char sized object for my array).

// "works", but is UB as far as I understand
uint32_t v = 0x11223344;
auto p = reinterpret_cast<uint8_t *>(&v);
p[0] = 0x55;

Once again, UB is bad, therefore we don't want to do that.

Some say that this is OK if we use a char* instead of a uint8_t*:

// "works", but maybe is UB?
uint32_t v = 0x11223344;
auto p = reinterpret_cast<char *>(&v);
p[0] = 0x55;

But I am honestly not sure about it... So getting creative.


So, I think I remember it being legal (as far as I know) to read the contents of a void* cast to a char* (this allows things like std::memcpy to not be UB). So maybe we can kinda play with this:

uint8_t get_byte(const void *p, size_t n) {
    auto ptr = static_cast<const char *>(p);
    return ptr[n];
}

void set_byte(void *p, size_t index, uint8_t v) {
    auto ptr = static_cast<char *>(p);
    ptr[index] = v;
}

// "works", is this UB?
uint32_t v = 0x11223344;
uint8_t v1 = get_byte(&v, 0); // read
set_byte(&v, 0, 0x55);        // write

So my questions are:

  1. Is the final example I came up with UB?

  2. If it is, what is the "right" way to do this? I really hope the "correct" way isn't a memcpy to and from a byte array. That would be ridiculous.

  3. (BONUS): suppose I want my get_byte to return a reference (like for implementing operator[]. Is it safe to use uint8_t instead of literal char when reading a the contents of a void *?

NOTE: I understand the concerns regarding endian and portability. They are not a problem for my use case. I think that it is acceptable for the result to be an "unspecified value" (in that it is compiler specific which byte it will read). My question is really focused on the UB aspects ("nasal demons" and similar).

timrau
  • 22,578
  • 4
  • 51
  • 64
Evan Teran
  • 87,561
  • 32
  • 179
  • 238
  • any reason you are averse to `std::bitset`? – kfmfe04 Oct 06 '15 at 17:49
  • `memcpy` is not equivalent to type punning via a union. – Puppy Oct 06 '15 at 17:50
  • @kfmfe04, `std::bitset` doesn't fit my use case.It allows (proxied) access to bits, not bytes. – Evan Teran Oct 06 '15 at 17:52
  • @Puppy in which cases would memcpy not be sufficient to replace type punning through a union? Several threads including [this one](https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/BNssB8TjIE4) suggest memcpy is sufficient. Are there caveats I am missing? – Shafik Yaghmour Oct 06 '15 at 18:08
  • `memcpy` is allowed to use *pure magic* to avoid UB. The standard library is not bound by the same rules as the rest of us. – Bo Persson Oct 06 '15 at 18:10
  • @BoPersson, perhaps. But the standard (as far as I understand) allows a pointer to a POD type to be assigned to a `void*` and subsequently read when cast to a pointer to a "character type" – Evan Teran Oct 06 '15 at 18:20
  • 1
    I only see the guarantee that if you cast to `void*` and back to the original type, you get the same pointer. Casting the `void*` to some other type is not ok. – Bo Persson Oct 06 '15 at 18:25
  • @BoPersson, I could be wrong, lemme see if I can find relevant parts of standard (or not) – Evan Teran Oct 06 '15 at 18:27
  • @BoPersson, So a quick glace leads me to this: (C++03 3.9.2.4): "A `void*` shall be able to hold any object pointer. A cv-qualified or cv-unqualified (3.9.3) `void*` shall have the same representation and alignment requirements as a cv-qualified or cv-unqualified `char*`" It's a bit indirect, but I read that as "you can assign a pointer to any object to a `void*` and that a `void*` is readable and writable as a `char*`" (since they have the same "representation") – Evan Teran Oct 06 '15 at 18:32
  • @Evan - No. :-) That talks about the representation, and that a `void*` must be able to store a `char*` (even on odd systems where `char*` has a different size than `int*`). An `int*` and a `struct x*` might also have the same representation, but are not directly convertible. – Bo Persson Oct 06 '15 at 18:37
  • @BoPersson, hmm. you seem pretty knowledgeable about this, so I'm inclined to agree. But I could have sworn that there was some wording that allowed `T*` -> `void*` -> `char*` (through casts) specifically for cases like mine here. Surely there is a "proper" way that doesn't involve a `memcpy` or similar?! – Evan Teran Oct 06 '15 at 18:43
  • There is a solution that *does* involve memcpy, but where the compiler might optimize it out (through its *magic* :-) [bit_cast](http://stackoverflow.com/a/12398119/597607) – Bo Persson Oct 06 '15 at 18:49
  • Your second example is correct (if you use `unsigned char` instead - apparently the standard allows `uint8_t` to not be a typedef for uchar although it is on all actual systems); a character type can be used to read or write bytes of another object with a declared type. The only possible danger is if you create a bit pattern that is not a valid value for the object (then it will cause UB when you read the object via its actual type) – M.M Oct 06 '15 at 19:20
  • M.M, would it be possible to site a source for this? I was under a similar impression, but @BoPersson has made a compelling case that I was mistaken. – Evan Teran Oct 06 '15 at 19:28
  • No, I'm not arguing against accessing bytes through a `char*`. That is explicitly allowed. It's the conversion through a `void*` that is fishy. – Bo Persson Oct 06 '15 at 19:45
  • @BoPersson, OK, I am must have missed something, are you suggesting that a simple: `reinterpret_cast(&v)[1] = 0x12;` is not UB? – Evan Teran Oct 06 '15 at 19:51
  • @BoPersson, I found an interesting bit in C++03 (3.8.5) about object lifetime. It's a bit long to quote here, so I'll abridge: "[...]If the object will be or was of a non-POD class type, the program has undefined behavior if:[...]the pointer is used as the operand of a static_cast (5.2.9) **(except when the conversion is to void*, or to void* and subsequently to char*, or unsigned char*).**" That last bit implies to me that a `void*` -> `char*` of an object is legal (even for non-POD types since that's the context of this paragraph). Thoughts? – Evan Teran Oct 06 '15 at 19:58
  • For me, `reinterpret_cast(&v)[1] = 0x12;` breaks the strict-alias rule. – Jarod42 Oct 06 '15 at 20:03
  • Additionally, there is a lot of interesting things in 3.9 about types and object representation. It says things like: "The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T)." So we know that the underlying representation is guaranteed to be the same as `unsigned char[sizeof(T)]` (with alignment requirements). The only outstanding question I have is the legality of accessing an object of type `T` through a pointer of type `char*` – Evan Teran Oct 06 '15 at 20:03
  • @Evan - This has changed in later standards, so it's a bit hard to know. I think 3.8 talks about memory where the original object is gone (to solve problems with allocators and stuff). I do think (a least in C++14) that accessing through a pointer formed by `reinterpret_cast(&v)` is allowed. `char*` pointers have special exceptions from the strict aliasing rules. You might still end up with UB though, if the bit pattern written isn't valid for the modified type. – Bo Persson Oct 06 '15 at 20:09
  • Writing from a `char*` break the strict aliasing rules, but not the reading. – Jarod42 Oct 06 '15 at 20:11
  • @BoPersson, sure, I think it's fine to accept that I am responsible for ensuring a valid bit-pattern for the type (though in this trivial example, there is no invalid pattern than I'm away of). I only have draft versions of C++11,C++14. But If C++11 allows accessing an object via a `char*` cast, then that will work :-). I'll have to check to see what the drafts say about this :-). Thanks for the interesting discussion! – Evan Teran Oct 06 '15 at 20:16
  • @BoPersson, interestingly. I believe that because of C++11[Draft](5.2.10.7) that `reinterpret_cast(&v)` implies that the "through `void*`" is also legal. It says (regarding a conversion between different object types): "the result is `static_cast(static_cast(v))` if both `T1` and `T2` are standard-layout types". PS: You've been fairly generous with your responses here, if it's a bother, please feel free to say so :-). – Evan Teran Oct 06 '15 at 20:31
  • Unfortunately it also says "converting ... and back to its original type yields the original pointer value.". It doesn't say what happens if you modify the value and *then* convert it back. Or use it while being of the wrong type. Could be that Jarod42 is right, so that reading the bytes is ok but not writing. – Bo Persson Oct 06 '15 at 20:44

2 Answers2

3

Why not create a class for that ?

Something like:

class MyInt32 {
public:
    std::uint32_t asInt32() const {
        return b[0]
             | (b[1] << 8)
             | (b[2] << 16)
             | (b[3] << 24);
    }
    void setInt32(std::uint32 i) {
        b[0] = (i & 0xFF);
        b[1] = ((i >> 8) & 0xFF);
        b[2] = ((i >> 16) & 0xFF);
        b[3] = ((i >> 24) & 0xFF);
    }
    const std::array<std::uint8_t, 4u>& asInt8() const { return b; }
    std::array<std::uint8_t, 4u>& asInt8() { return b; }
    void setInt8s(const std::array<std::uint8_t, 4u>& a) { b = a; }
private:
    std::array<std::uint8_t, 4u> b;
};

So you don't have UB, you don't break aliasing rules, you manage endianess as you want.

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • I suppose I could. The example I gave is a bit simplified. The real code wants to both read and write as either bytes/integers. I could of course do the bit twiddling for writes too (probably would need a proxy object like `std::bitset` does), but it would be doable. – Evan Teran Oct 06 '15 at 19:27
  • Wow, that's a lot of characters. Seeing this almost convinces me to use unions and `-fno-strict-aliasing`. – rr- Jan 02 '16 at 00:24
  • @rr: unfortunately, in c++, correct way are often verbose: `explicit` whereas *implicit* would be better, `const` vs *mutable*, `std::array` vs `T[N]`, `smart_ptr` vs `T*`... – Jarod42 Jan 02 '16 at 02:04
0

It's perfectly legit (as long as the type is a POD), and uint8_t is not guaranteed to be legal so don't.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • 1
    which is legit? using the `void*` via a cast to `char*`? As for `uint8_t` is it not guaranteed to be a "character type"? – Evan Teran Oct 06 '15 at 17:51
  • It's not guaranteed to be a character type. It's only guaranteed to be an unsigned 8 bit integer type. – Puppy Oct 06 '15 at 17:51