2

The following code is the classical, infamous type-punning union. Similar code is legal in C99/C11 but illegal in C++, because one has to read from the inactive member.

To activate the inactive member one has to write to it (that is not appropriate here) or to construct it via placement-new.

The code below does exactly this: it initializes member a and then activates member b by placement-new. In case of an array of primitive types, the construction is actually a noop, and the bitvalues remain the same.

I never saw this solution in the numerous other questions / answers to this topix, so I wonder if the code below in fact circumvents UB really.

#include <cstdint>
#include <cstddef>
#include <utility>
#include <array>
#include <tuple>
#include <cassert>

union ByteUnion32 {
public:
    explicit ByteUnion32(uint32_t v) : a{v}{
        new (&b) std::byte[sizeof(uint32_t)]; // set b active, but prevents constexpr, actually a noop
    }
    std::byte operator[](uint8_t i) const {
            return b[i];
    }
private:
    uint32_t a;
    std::byte b[sizeof(uint32_t)];
};
int main(){
    ByteUnion32 u10{0x01020304}; // initialise and activate byte-memebers
    auto v10 = std::to_integer<uint8_t>(u10[0]);    
    assert(v10 == 4);

}
geza
  • 28,403
  • 6
  • 61
  • 135
wimalopaan
  • 4,838
  • 1
  • 21
  • 39
  • converts std::byte back to uint-type, should be irrelavant for the question – wimalopaan Nov 15 '17 at 15:36
  • Interesting question, but I think this is UB, at least that's the intention of the standard, if it is not explicitly said. – geza Nov 15 '17 at 15:37
  • Here some references stating to use placement new: https://stackoverflow.com/questions/39763548/memcpy-memmove-to-a-union-member-does-this-set-the-active-member – wimalopaan Nov 15 '17 at 16:36
  • 1
    also: http://en.cppreference.com/w/cpp/language/union – wimalopaan Nov 15 '17 at 16:37
  • 1
    ans: http://talesofcpp.fusionfenix.com/post-17/eggs.variant---part-i – wimalopaan Nov 15 '17 at 16:37
  • There's nothing wrong with placement new. Where I sense some problem is that: first, you set active member to `a`. Then, with placement new, you change active member to `b`. So far, so good. Then, you read from `b`, which is not "initialized" yet, but you depend on the previous value of `a`. And to make the issue more difficult, you use `std::byte`, for which aliasing is allowed. You could have just used `reinterpret_cast`, which is not UB for sure. – geza Nov 15 '17 at 16:48
  • Well, my thought was to active `b` (new-op) and then read it, because it is activen then, assuming, that the bits remain the same. Well, it could be a cast under the same assumption ... – wimalopaan Nov 15 '17 at 16:55
  • But what if one wants to provide the opposite direction: then she has to cast to uint32_t, that violates strict aliasing, I think? – wimalopaan Nov 15 '17 at 16:56
  • Yes. "assuming, that the bits remain" - because of that part, I think that it is UB. But I'm not 100% sure. On the other hand, the cast is well defined for sure. And yes, the other direction is UB in the case of cast. The general advice here is to use memcpy, if you want to be absolutely sure that you're standard conformant. Or just use unions, it works with GCC, Clang, MSVC (but remember, according to the standard, it is UB). – geza Nov 15 '17 at 17:00
  • I thought, the above "technique" using placement-new could be a way of getting rid of UB. – wimalopaan Nov 15 '17 at 17:04
  • Maybe. I'm not skilled enough to judge this, but my feeling is that it is still UB. I'd recommend you to use memcpy, as its efficiency is the same as the union variant (using modern compilers). – geza Nov 15 '17 at 17:10

2 Answers2

1

From [dcl.init]/12:

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced ([expr.ass]). [...] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases: [...]

In this code:

assert(v10 == 4);

Either the initialization of v10 itself is undefined behavior due to not fitting any of the bullet points (I'm not entirely sure about this), or the initialization of v10 is fine and it just is considered to have indeterminate value, at which point the ultimate comparison to 4 is undefined behavior.

Either way, this is undefined behavior.

Barry
  • 286,269
  • 29
  • 621
  • 977
0

Instead of using a union to provide both conversion direction, the following maybe a solution in C++ using two specializations for the conversion directions std::byte to larger integer-type or integer-type to std::byte.

#include <cstdint>
#include <cstddef>
#include <utility>
#include <array>
#include <tuple>
#include <cassert>
#include <cstring>

namespace std {
    enum class endian { // since c++20
#ifdef _WIN32
        little = 0,
        big    = 1,
        native = little
#else
        little = __ORDER_LITTLE_ENDIAN__,
        big    = __ORDER_BIG_ENDIAN__,
        native = __BYTE_ORDER__
#endif
    };
    namespace literals {
        constexpr byte operator"" _byte(unsigned long long v) { // <> Literal-Operator for suffix `_byte`
            return byte{static_cast<uint8_t>(v)};
        }
    }
}

template<typename T>
requires (std::is_same<uint64_t, T>::value || std::is_same<uint32_t, T>::value || std::is_same<uint16_t, T>::value) && (sizeof(T) > 1)
struct ByteUnion<ByteToIntegral, T> {
public:
    template<typename... B>
    requires (std::is_same<B, std::byte>::value || ...) && (sizeof...(B) == sizeof(T))
    explicit ByteUnion(B... vv) : ByteUnion{endian_type{}, std::index_sequence_for<B...>{}, vv...} {}

    const T& value() const {return a;}     
private:
    template<size_t... II, typename... B>
    explicit ByteUnion(LittleEndian, std::index_sequence<II...>, B&&... vv) {
        uint8_t* p = reinterpret_cast<uint8_t*>(&a);
        ((p[sizeof...(II) - 1 - II] = static_cast<uint8_t>(vv)),...);    
    }
    template<size_t... II, typename... B>
    explicit ByteUnion(BigEndian, std::index_sequence<II...>, B&&... vv) {
        uint8_t* p = reinterpret_cast<uint8_t*>(&a);
        ((p[II] = static_cast<uint8_t>(vv)),...);
    }
    T a;
};

using namespace std::literals;

int main(){
    ByteUnion<IntegralToByte, uint64_t> u1{0x01020304}; // initialise and activate byte-memebers
//    ByteUnion<ByteToIntegral, uint64_t> u1x{0x01020304}; // not possible
    auto v1 = std::to_integer<uint8_t>(u1[0]);    
//    u1.value(); // not possible, who access inactive member
    assert(v1 == 4);

    ByteUnion<ByteToIntegral, uint32_t> u2{0x01_byte, 0x02_byte, 0x03_byte, 0x04_byte}; // initialise and activate integral value member
//    ByteUnion<IntegralToByte, uint32_t> u2x{0x01_byte, 0x02_byte, 0x03_byte, 0x04_byte}; // not possible
    auto v2 = u2.value(); 
    assert(v2 == 0x01020304);
//    u2[0];

}
wimalopaan
  • 4,838
  • 1
  • 21
  • 39
  • As far as I know, this is UB. You write to `a` as `uint8_t`, then read from it as a different type. You can read the underlying bytes as `std::byte` or `(unsigned) char`, but you cannot write it as `byte/char` then read as a different type. It works only in one way. – geza Nov 15 '17 at 18:18
  • In the BigEndian case I can replace the fold-expression with a memcpy, then it should be fine. In the LittleEndian case my thought was , that the fold-expression is essentially the same as a memcpy. – wimalopaan Nov 15 '17 at 22:10
  • What do you want to achieve in the end? Maybe your approach is not ideal, and there is a better solution... – geza Nov 15 '17 at 22:13
  • The goal is the implement the same as type-punning through unions as in C99/C11 with additionally regarding the byte-order problems. maybe I'm to focused on the union-approach. – wimalopaan Nov 15 '17 at 22:15
  • But why do you want to achieve this with type-punning? What's the reason the you choose this way? With unions? What's the big picture, why do you need this? Maybe it's better to do the endianness conversion in an `int`. For example, on x86, there's an instruction for this, which is accessible from gcc and msvc as well. – geza Nov 15 '17 at 22:17
  • The most basic use-case is in µC applications: getting a number of bytes via some sort of interface that must be combined into a integral number or the other way round. In C99/C11 the fastest way to do this is via type-punning using unions. In C++ one safely can to the conversion using the byte-shift approach, but is is a performance burden on small say 8-Bit µC. – wimalopaan Nov 15 '17 at 22:21
  • Which compiler do you use? – geza Nov 15 '17 at 22:24
  • Well, the solution should be compiler-agnostic. Actually I'm using gcc, mainly because of concepts. – wimalopaan Nov 15 '17 at 22:25
  • Maybe you should ask this question :). Type punning through unions is UB in C++. But, as I've said earlier, it does work in gcc, clang, and msvc too. And I've read somewhere, that *maybe* the committee will relax the rules of type punning through unions, if the union has standard layout. But, to be honest, I would not go for compiler-angostic in this case. GCC has __builtin_bswap32, MSVC has its thing too, that's the most efficient way to change endianness. And if, you don't use this compilers, then your original solution can kick in. – geza Nov 15 '17 at 22:42
  • One can use a solution with a combination of C11 and C++, that is use C-functions that operate on the unions to do the actual punnind. This is a bit of effort due to the lack of templates but it should work. Due to lto there should be no performance penalty. – wimalopaan Nov 15 '17 at 22:45
  • It seems that I was wrong here: if you use `std::byte` or `(unsigned) char` to read or **modify**, then it is not UB. http://en.cppreference.com/w/cpp/language/reinterpret_cast: "Whenever an attempt is made to read or **modify** the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true: [...] AliasedType is std::byte, char, or unsigned char" – geza Nov 16 '17 at 10:15