4

I have an object of type std::wstring and a library function which takes const uint32_t* as an argument. My code is running on platforms where sizeof(wchar_t) == 4. There is no API in the library (harfbuzz) which takes const wchar_t*, so I must convert my std::wstring. I know that reinterpret_cast is not safe (it violates strict aliasing rule), but I don't want to create std::vector<uint32_t>, resize it and memcpy my whole string. I'm 100% sure that compiler will not optimize it out. And it seems that std::bit_cast can't help me too, it can convert only a single wchar_t to uint32_t.

I see only one solution: use reinterpret_cast. String will not be changed nor in my code neither in library so I don't think that violating strict aliasing rule is a problem in such a case.

Is there any other option? What do you do when type of data in your code doesn't match with input type in some C-library (int32_t* vs uint32_t*, char16_t* vs unsigned short*, etc.)?

  • 1
    `using alt_string = std::basic_string;`? – Marek R Feb 16 '21 at 15:39
  • 1
    `I have an object of type std::wstring` Do you have an option of not having a `std::wstring` in the first palce? – eerorika Feb 16 '21 at 15:39
  • Unfortunately C++ is broken in this way, as there's no legal way out of strict aliasing. In practice a `reinterpret_cast` will often appear to work as intended, as the compiler has no place to "store" an entire array of values so aliasing doesn't really matter (of course it will fail on e.g. Win32 where `wchar_t` is 16 bit). – rustyx Feb 16 '21 at 15:40
  • @eerorika Unfortunately, no – Filipp Slavnejshev Feb 16 '21 at 15:45
  • Also I don't think that it's good strategy to write your code such a way that it depends on types used in API of some 3rd party library. What if API of library is changed? What if one library is replaced with another (and its API differs from the first one)? – Filipp Slavnejshev Feb 16 '21 at 15:51
  • I'm not fully understand it, but I think [std::launder](https://stackoverflow.com/a/39382728/1387438) may help to protect result form `reinterpret_cast` (at least example at bottom suggest that). – Marek R Feb 16 '21 at 15:59
  • @MarekR std::launder alone doesn't help here. – eerorika Feb 16 '21 at 16:10
  • 1
    "*My code is running on platforms where `sizeof(wchar_t) == 4`*" - what about using `std::u32string` (`std::basic_string`) instead? Then your code can still work even on platforms where `sizeof(wchar_t) == 2`. You would still need a `reinterpret_cast` to go from `char32_t*` to `uint32_t*`, but at least they are guaranteed to be the same bit size. – Remy Lebeau Feb 16 '21 at 18:16

2 Answers2

2

Only possible way in standard C++ to avoid both type aliasing violations, and avoid copying is to reuse the storage of the std::wstring buffer and "create" the array of std::uint32_t in its place:

template<class Dst, class Src>
Dst*
reinterpret(std::span<Src> src) noexcept
{
    static_assert(sizeof(Src) == sizeof(Dst));
    static_assert(std::is_trivial_v<Src>);
    static_assert(std::is_trivial_v<Dst>);

    Src* storage = src.data();
    std::size_t size = src.size();
    Dst value = src[0];
    Dst* result = ::new (storage) Dst(value);
    for(std::size_t i = 1; i < size; i++) {
        Dst value = std::bit_cast<Dst>(src[i]);
        ::new (storage + i) Dst(value);
    }
    return result;
}

This effectively makes no changes to the data, and only achieves circumventing the type aliasing. A good optimiser compiles the function to:

mov     rax, rdi
ret

Drawback is that you rely on the optimiser to do its job. There's no guarantee that the loop is optimised away.

This is basically what a proposed std::start_lifetime_as_array library function is supposed to do.


A non-standard alternative is to disable type aliasing restrictions at the cost of the optimisations that is provides. Assuming your compiler supports such option.


P.S. Array placement new cannot be used for this purpose (and probably not for any purpose).

eerorika
  • 232,697
  • 12
  • 197
  • 326
1

If you need strict standard compliance, the only option is to memcopy from c_str() on the std::wstring to a uint32_t array. You can just shout @#! strict aliasing rule, if it can calm you a little...

If you are just a real world programmer using real world compilers, most of them (it not all) know that the strict aliasing rule can sometimes be a little too strict and offer (as a documented extension) an option to release it. My advice is to use that option on the compilation unit where you do the abusive casting (C style cast or reinterpret_cast), but to make that compilation unit as small as possible, to still allow the optimizing compiler to assume strict aliasing on all other compilation units.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252