2

I'd like to have a type, which is like unsigned char:

  • sizeof is 1
  • integer values can be assigned to it (without any casts)
  • bit manipulations are allowed
  • arithmetic is allowed, but not a must
  • unsigned
  • trivially copyable

But, unlike unsigned char, it is not allowed to alias. I mean, a type, which doesn't have the exception [basic.lval/11.8]:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

[...]

  • a char, unsigned char, or std​::​byte type.

Is it possible to have a type like this?

The reason: I almost never use unsigned char's aliasing property. So, I'd like to use a type instead, which doesn't prevent certain kind of optimizations (note, I asked this question because I actually have functions, which aren't optimized well, because of the aliasing-allowing property of unsigned char). So, I'd like to have a type for which this is true: "don't pay for what you don't use".


Here's an example, where unsigned char prevents optimization: Using this pointer causes strange deoptimization in hot loop

geza
  • 28,403
  • 6
  • 61
  • 135
  • static_assert that std::uint8_t is available. – Richard Critten Sep 08 '18 at 20:26
  • 4
    @RichardCritten `std::uint8_t` won't help much since most implementations typedef it to be `unsigned char`. // edit: https://stackoverflow.com/questions/16138237/when-is-uint8-t-%E2%89%A0-unsigned-char#comment23169800_16138470 – Swordfish Sep 08 '18 at 20:43
  • 1
    Why does it matter so much that integer values can be assigned to it *without casts* that you'd edit the question in response to a solution that requires it? I can't think of a reason why writing `some_type{n}` instead of `n` is a dealbreaker – Justin Sep 08 '18 at 21:34
  • @Justin: "integer values can be assigned to it". For me, it already means `a = 42;`, with no casts. For your solution, it is not true, that "integer values can be assigned to it". You cast it, so you assign an enum, not an integer. I've edited the question to make this clear, not to discredit your answer. – geza Sep 08 '18 at 21:37
  • You can use a couple of `#ifdef`'s and `restrict`, `__restrict` or `__restrict__`. –  Sep 08 '18 at 21:47
  • @geza: Are you using this for string manipulation or manipulating a sequence of 8-bit integers? – Nicol Bolas Sep 08 '18 at 22:02
  • @NicolBolas: I don't have specific area in mind. It happened a lot of times before that aliasing behavior of `unsigned char` made my code slower. I'd like to have a type, which can be used in almost every situation instead of `unsigned char`. But I understand if this cannot be done. I've asked this question, because I'm not very much up-to-date with current C++ standard, maybe something has changed that makes this possible. – geza Sep 08 '18 at 22:15

2 Answers2

3

That section of the standard calls out char, unsigned char, and std::byte. However, you can make your own type which is like std::byte and it wouldn't be allowed to alias:

enum class my_byte : unsigned char {};

Using it isn't so nice, as you have to cast to unsigned char to do anything meaningful with it. However, you can overload the bitwise and arithmetic operators to make it nicer to work with.


We can verify this with the following simple function:

auto foo(A& a, B& b) {
    auto lhs = b;
    a = 42;
    auto rhs = b;
    return lhs + rhs;
}

If A was allowed to alias with B, the compiler would have to generate two loads: one for lhs and one for rhs. If A was not allowed to alias with B, the compiler can generate a single load and just add the value to itself. Let's test it:

// int& cannot alias with long&
auto foo(int& a, long& b) {
    auto lhs = b;
    a = 42;
    auto rhs = b;
    return lhs + rhs;
}

// std::byte& can alias with long&    
auto bar(std::byte& a, long& b) {
   auto lhs = b;
    a = (std::byte)42;
    auto rhs = b;
    return lhs + rhs;
}

// if my_byte& can alias with long&, there would have to be two loads
auto baz(my_byte& a, long& b) {
    auto lhs = b;
    a = (my_byte)42;
    auto rhs = b;
    return lhs + rhs;
}

This results in the following:

foo(int&, long&):
        mov     rax, QWORD PTR [rsi]
        mov     DWORD PTR [rdi], 42
        add     rax, rax
        ret
bar(std::byte&, long&):
        mov     rax, QWORD PTR [rsi]
        mov     BYTE PTR [rdi], 42
        add     rax, QWORD PTR [rsi]
        ret
baz(my_byte&, long&):
        mov     rax, QWORD PTR [rsi]
        mov     BYTE PTR [rdi], 42
        add     rax, rax
        ret

Thus my_byte does not inherit the same aliasing properties as char and std::byte

Justin
  • 24,288
  • 12
  • 92
  • 142
  • Thanks, the problem with this approach is that it doesn't support integer assignment conveniently (needs a cast), and I don't see any way to fix this. – geza Sep 08 '18 at 21:13
  • 1
    @geza You can write `my_byte{n}`, making it look like any other type without implicit conversions. – Justin Sep 08 '18 at 21:24
  • I mean, I'd like to write `a = 42;`, just like I do for `unsigned char`. With no casts. – geza Sep 08 '18 at 21:29
  • 1
    @geza I can't think of a way to make that work, and it really isn't a big deal. Lacking implicit conversions isn't much of a pain, especially since you don't have to write `static_cast(n)`. Also, I think it's technically not considered a cast. You could add a literal for it like `42_byte`. – Justin Sep 08 '18 at 21:31
  • I intend to replace `unsigned char` with this hypothetical type in my code base. It would mean a lot of additional casts. It is noise, I don't like it. – geza Sep 08 '18 at 21:35
0

You can define your own type:

#include <type_traits>

class uchar {
    unsigned char value = {};

public:
    template <typename T,
        std::enable_if_t<
            std::is_convertible_v<T, unsigned char>,
            int
        > = 0>
    constexpr uchar(T value)
        : value{static_cast<unsigned char>(value)}
    {}

    constexpr uchar()
    {}

    template <typename T,
        std::enable_if_t<
            std::is_convertible_v<T, unsigned char>,
            int
        > = 0>
    constexpr uchar& operator=(T value)
    {
        this->value = static_cast<unsigned char>(value);
        return *this;
    }

    explicit constexpr operator unsigned char() const
    {
        return value;
    }

    friend constexpr uchar operator+(uchar lhs, uchar rhs) {
        return lhs.value + rhs.value;
    }

    friend constexpr uchar operator-(uchar lhs, uchar rhs) {
        return lhs.value - rhs.value;
    }

    // And so on...
};

// The compiler could technically add padding after the `value` member of
// `uchar`, so we `static_assert` to verify that it didn't. I can't imagine
// any sane implementation would do so for a single-member type like `uchar`
static_assert(sizeof(uchar) == sizeof(unsigned char));
static_assert(alignof(uchar) == alignof(unsigned char));
Justin
  • 24,288
  • 12
  • 92
  • 142
  • @Ivan That was on purpose. Note that the `operator unsigned char()` is `explicit`. You have to explicitly ask for it with `static_cast(uchar())`. You could remove the `explicit` if you want an implicit conversion – Justin Sep 08 '18 at 21:59
  • Then `int() ? int() : uchar()` becomes broken. –  Sep 08 '18 at 22:01
  • @Ivan Which is why I don't recommend implicit conversions in both directions – Justin Sep 08 '18 at 22:04
  • 2
    Note that there is no way to guarantee that this type has the size/alignment of `unsigned char`. You can `static_assert` on it, but that's the best you can do. – Nicol Bolas Sep 08 '18 at 22:04
  • @NicolBolas Very true. I would be very surprised if those `static_assert`s failed, however. I can't imagine any implementation would add padding to a single-member type. – Justin Sep 08 '18 at 22:14