7

The recent addition of std::byte to C++17 got me wondering why this type was even added to the standard at all. Even after reading the cppreference reference it's use cases don't seem clear to me.

The only use case I can come up with is that it more clearly expresses intent, as std::byte should only be treated as a collection of bits instead of a character type such as char which we used for both purposes before. Meaning that:

this:

std::vector<std::byte> memory;

Is more clear than this:

std::vector<char> memory;

Is this the only use case and reason it was added to the standard or am I missing a big point here?

Hatted Rooster
  • 35,759
  • 6
  • 62
  • 122
  • [This](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0298r0.pdf) seems to be the document behind the addition. It has a section for the motivations behind the proposal. – bgfvdu3w Jan 20 '18 at 10:58
  • There are subtle problems that can bite you using `char` such as it is implementation defined if it is signed or not. Signed `char` can produce a surprising value if it gets cast to an `int` (due to sign extension). So you could write code that works on some systems but not others. Also you can accidentally do math on `char` (because it's an integer) which won't be a problem for `std::byte`. So this could potentially benefit any code that performs serialization/streaming data such as networking. – Galik Jan 20 '18 at 11:03
  • @Galik: when was the last time you "accidentally" did the math on something? Generally if I write `+` it doesn't happen by accident. – Matteo Italia Jan 20 '18 at 11:49
  • 1
    My rationalization for the addition of `std::byte` to the standard with its current definition is that the procedure for standard definition of C++ is clearly broken. – 6502 Jan 20 '18 at 12:03
  • @MatteoItalia Anything that can go wrong, will have gone wrong at least a few times in any large body of source code. Unless you have a team of programmers that never write the wrong variable name by mistake. – Galik Jan 20 '18 at 13:06
  • @Galik: safeguards are important, but come at a cost in usability, that's why generally you introduce them for actual problems, not imagined ones. When was the last time I overflew an array? Last week. Use after free? Doesn't happen *that* often to modern code I write, but I fixed plenty of bugs related to memory and ownership. Memory leaks? Concurrency bugs? It's all stuff that happens frequently. Now I'll ask you again, when was the last time you *accidentally* did math on something? Personally I cannot remember, and instead I wrote a decent amount of bit fiddling code that needs arithmetic. – Matteo Italia Jan 20 '18 at 13:43

1 Answers1

11

The only use case I can come up with is that it more clearly expresses intent

I think it was one of the reasons. This paper explains the motivation behind std::byte and compares its usage with the usage of char:

Motivation and Scope

Many programs require byte-oriented access to memory. Today, such programs must use either the char, signed char, or unsigned char types for this purpose. However, these types perform a “triple duty”. Not only are they used for byte addressing, but also as arithmetic types, and as character types. This multiplicity of roles opens the door for programmer error – such as accidentally performing arithmetic on memory that should be treated as a byte value – and confusion for both programmers and tools. Having a distinct byte type improves type-safety, by distinguishing byte-oriented access to memory from accessing memory as a character or integral value. It improves readability.

Having the type would also make the intent of code clearer to readers (as well as tooling for understanding and transforming programs). It increases type-safety by removing ambiguities in expression of programmer’s intent, thereby increasing the accuracy of analysis tools.

Another reason is that std::byte is restricted in terms of operations which can be performed on this type:

Like char and unsigned char, it can be used to access raw memory occupied by other objects (object representation), but unlike those types, it is not a character type and is not an arithmetic type. A byte is only a collection of bits, and only bitwise logic operators are defined for it.

which ensures an additional type safety as it is mentioned in the paper above.

Edgar Rokjān
  • 17,245
  • 4
  • 40
  • 67
  • 6
    IMO the fact that `std::byte` is not an arithmetic type is just a solution in search of a problem. I cannot think at a situation when I ever worked on a sequence of bytes *just to use bitwise operations*. If I want a sequence of opaque bytes I use `void *` (although it has the downside of pointer arithmetic not working correctly), if I want to do low level bytes manipulation give me the whole arsenal of operations. We already have a type that is great at being a "raw byte": it's called `unsigned char`. Bitwise and arithmetic operations work in a guaranteed and unsurprising way, ... – Matteo Italia Jan 20 '18 at 11:32
  • 2
    ... `unsigned char` pointers can already alias any other pointer (without polluting the whole standard with an extra magic type), pointer arithmetic already works fine. The only problem with `unsigned char` is that it's a PITA to write, is a two-tokens name (so function style casts & co. are problematic) and that its name may not be as suggestive. These problems are solved by doing `namespace std { typedef unsigned char byte; }` - which incidentally is essentially what has been done in countless code bases that manipulate raw bytes. – Matteo Italia Jan 20 '18 at 11:35
  • @MatteoItalia: "*These problems are solved by doing...*" Nonsense. `unsigned char*` could be a byte array or a character array. That "solution" doesn't allow you to actually different between the two; it's no better than a comment that says that a parameter is a byte array. Furthermore, because of the lack of distinction, the standard requires that `unsigned char*` can alias freely. This breaks a number of optimizations on functions that take strings as opposed to generic byte arrays. Only byte arrays should be able to alias freely. A strong typedef is essential to fixing this. – Nicol Bolas Jan 20 '18 at 17:03
  • @MatteoItalia, inserting anything like that into the std namespace is undefined behavior, so that's not even an acceptable solution from the get go. – Mário Feroldi Jan 20 '18 at 17:28
  • @MárioFeroldi: of course nobody does that into the `std` namespace, I just meant that everybody has his own shorter `typedef` for `unsigned char`. – Matteo Italia Jan 20 '18 at 18:26
  • 1
    @NicolBolas: nonsense, in virtually all the C and C++ standard libraries characters are plain `char` (whose signedness is idiotically implementation-defined, but we are digressing). "it's no better than a comment that says that a parameter is a byte array" which is perfectly fine; this necessity is entirely artificial. All the rest of your comment is pointless, given that the standard lets `char`, `signed char`, `unsigned char` and `std::byte` all to alias freely whatever other type, so I don't see how this "essential" strong typedef has changed anything in the aliasing situation. – Matteo Italia Jan 20 '18 at 18:32
  • 2
    Now, if you want to say that the character types should not have the aliasing wildcard and that there should have been a separate byte type, and only it should have the aliasing magic that's something we can discuss for another language (and I'll agree only if the `byte` type behaves like an unsigned integer), but unfortunately this isn't something you can change in C++, and adding a lame `std::byte` today only adds confusion. – Matteo Italia Jan 20 '18 at 18:38
  • @MatteoItalia: It's an essential *first step*. Later, once lots of code and APIs have converted to using `std::byte`, they could perhaps deprecate the aliasing, with the goal of eventually removing it. But unless you have a type that is distinct from a character array, there is no way to move towards undoing this aliasing. Also, `signed char` doesn't participate in aliasing. – Nicol Bolas Jan 20 '18 at 18:38
  • 1
    Deprecating aliasing for char types is something that cannot be done, period. It's a silent death trap that would break tons of perfectly legal code, and would be the final confirmation that the committee has jumped the shark. There are way more pragmatic, opt-in approaches to the problem (in the style of `restrict`) - when you can actually measure it, aliasing is way overstated as a problem. – Matteo Italia Jan 20 '18 at 19:11