17

The more I read, the more confused I get.

The last question from the related ones is closest to my question, but I got confused with all words about object lifetime and especially - is it OK to only read or not.


To get straight to the point. Correct me if I'm wrong.

This is fine, gcc does not give warning and I'm trying to "read type T (uint32_t) via char*":

uint32_t num = 0x01020304;
char* buff = reinterpret_cast< char* >( &num );

But this is "bad" (also gives a warning) and I'm trying "the other way around":

char buff[ 4 ] = { 0x1, 0x2, 0x3, 0x4 };
uint32_t num = *reinterpret_cast< uint32_t* >( buff );

How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)? Plus, adding const does not change the situation in any way.

Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"? I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).

Is this the same for C and C++ (as I read a comment, implying it's different for the 2 languages)?


I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).

So, after reading a lot, I'm now more confused. I guess only memcpy is the "good" solution?


Related questions:


EDIT
The real world situation: I have a third party lib (http://www.fastcrypto.org/), which calculates UMAC and the returned value is in char[ 4 ]. Then I need to convert this to uint32_t. And, btw, the lib uses things like ((UINT32 *)pc->nonce)[0] = ((UINT32 *)nonce)[0] a lot. Anyway.

Also, I'm asking about what is right and what is wrong and why. Not only about the reordering, optimization, etc. (what's interesting is that with -O0 there are no warnings, only with -O2).

And please note: I'm aware of the big/little endian situation. It's not the case here. I really want to ignore the endianness here. The "strict aliasing rules" sounds like something really serious, far more serious than wrong endianness. I mean - like accessing/modifying memory, which is not supposed to be touched; any kind of UB at all.

Quotes from the standards (C and C++) would be really appreciated. I couldn't find anything about aliasing rules or anything relevant.

Community
  • 1
  • 1
Kiril Kirov
  • 37,467
  • 22
  • 115
  • 187
  • 3
    buff might not even be suitably aligned... – Marc Glisse Jan 30 '15 at 16:03
  • @MarcGlisse - ahaa.. that's logical. But this does not have anything to do with reordering and optimizations? (the warning is gone, when -O0 is used instead of -O0). – Kiril Kirov Jan 30 '15 at 16:04
  • 1
    "How is the second one different from the first one," I assume you mean strictly with regards to addressing and aliasing, because that code is non-portable. Even if alignment weren't an issue, the value of `num` is the latter is not guaranteed to be equivalent to the initial value of `num` in the former unless you're on a bigE platform. – WhozCraig Jan 30 '15 at 16:06
  • 2
    @WhozCraig - Yes, I'm aware with the big/little endian. And yes, I'm asking whether it's portable and reliable and if not - why (I mean, I'm not interested only about the code reordering). – Kiril Kirov Jan 30 '15 at 16:11
  • 1
    I understand. Its a great question, I just didn't want the casual novice to see that and think its some silver bullet to their raw-bytes-to-`uint32` woes. Uptick on your question btw. No one sane can claim a down vote due to lack of research on your part for this. – WhozCraig Jan 30 '15 at 16:14
  • Another related question, if you like :) http://stackoverflow.com/questions/25994127/setting-a-buffer-of-char-with-intermediate-casting-to-int – Antonio Jan 30 '15 at 16:30
  • 2
    The rule starts with "If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: [...]". In your first case, the "object" is a `uint32_t` and you are accessing it through a glvalue of type `char`, which is allowed; in your second case, the "object" is either a `char` or an array of `char`s, and you are accessing it through a glvalue of type `uint32_t`, which is not any of the allowed types. – T.C. Jan 30 '15 at 16:31
  • For extra fun, add `new(buff)uint32_t(42);` somewhere and think of aligned_storage ;-) – Marc Glisse Jan 30 '15 at 16:34
  • 1
    @T.C. - could you say what exactly is `glvalue`? And I think I got confused by the "term" "object" - I thought about this only in OOP context and didn't consider `uint32_t` to be an object. – Kiril Kirov Jan 30 '15 at 16:35
  • [intro.object]/p1: "The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An *object* is a region of storage." A *glvalue* is something that's either an *lvalue* or an *xvalue*; http://stackoverflow.com/questions/3601602/what-are-rvalues-lvalues-xvalues-glvalues-and-prvalues has all the gory details. – T.C. Jan 30 '15 at 16:49
  • 1
    Perhaps this could of help: http://dbp-consulting.com/tutorials/StrictAliasing.html – Super-intelligent Shade Jan 30 '15 at 17:04
  • @KirilKirov: in C standardese, and then in C++ standardese, an _object_ is a _region of storage_. What you colloquially call an object would in standardese be called _an object of class type_. – ninjalj Jan 30 '15 at 20:10
  • @InnocentBystander - that article is great! Thanks. – Kiril Kirov Feb 02 '15 at 12:59

2 Answers2

12

How is the second one different from the first one, especially when we're talking about reordering instructions (for optimization)?

The problem is in the compiler using the rules to determine whether such an optimization is allowed. In the second case you're trying to read a char[] object via an incompatible pointer type, which is undefined behavior; hence, the compiler might re-order the read and write (or do anything else which you might not expect).

But, there are exceptions for "going the other way", i.e. reading an object of some type via a character type.

Or this is just a straight rule, which clearly states: "this can be done in the one direction, but not in the other"? I couldn't find anything relevant in the standards (searched for this especially in C++11 standard).

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf chapter 3.10 paragraph 10.

In C99, and also C11, it's 6.5 paragraph 7. For C++11, it's 3.10 ("Lvalues and Rvalues").

Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of character type for C or of either unsigned char or char type for C++). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.

I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).

Although the wording of the standard is horribly ambiguous, in C99 (and beyond) it's clear (at least since C99 TC3) that the intent is to allow type-punning through a union. You must however perform all accesses through the union. It's also not clear that you can "cast a union into existence", that is, the union object must exist first before you use it for type-punning.

the returned value is in char[ 4 ]. Then I need to convert this to uint32_t

Just use memcpy or manually shift the bytes to the correct position, in case byte-ordering is an issue. Good compilers can optimize this out anyway (yes, even the call to memcpy).

davmac
  • 20,150
  • 1
  • 40
  • 68
  • Both cases are using "incompatible pointer type"s. So, you're saying, that the exception about `char*` is **only** for the one way and not the other? – Kiril Kirov Jan 30 '15 at 16:26
-2

I used union to "workaround" this, which still appears to be NOT 100% OK, as it's not guaranteed by the standard (which states, that I can only rely on the value, which is last modified in the union).

Endianess is the reason for this. Specifically the sequence of bytes 01 00 00 00 could mean 1 or 16,777,216.

The correct way to do what you are doing is to stop trying to trick the compiler into doing the conversion for you and perform the conversion yourself.

For instance if the char[4] is little-endian (smallest byte first) then you would do something like the following.

char[] buff = new char[4];
uint32_t result = 0;
for (int i = 0; i < 4; i++)
    result = (result << 8) + buff[i];

This manually performs the conversion between the two and is guaranteed to always be correct as you are doing the mathematical conversion.

Now if you were doing this conversion rapidly it might make sense to use #if and knowledge of your architecture to use a enum to do this automatically as you mentioned, but that is again getting away from portable solutions. (Also you can use something like this as your fallback if you can't be certain)

Guvante
  • 18,775
  • 1
  • 33
  • 64