6

Reading Herb Sutter's blog post about the most recent C++ standard meeting, it noticed that std::byte was added to C++17. As an initial reading, I have some concerns since it uses unsigned char so that it can avoid complications with strict aliasing rules.

My biggest concern is, how does it work on platforms where CHAR_BIT is not 8? I have worked on/with platforms where CHAR_BIT is 16 or 32 (generally DSPs). Given that std::byte is for dealing with "byte-oriented access to memory", and most people understand byte to indicate an octet (not the size of the underlying character type), how will this work for individuals who expect that this will address contiguous 8-bit chunks of memory?

I already see people who just assume that CHAR_BIT is 8 (not evening knowing that CHAR_BIT exists...). A type called std::byte is likely to introduce even more confusion to individuals.


I guess that what I expected was that they were introducing a type to permit consistent addressing/access to sequential octets for all cases. There are many octet-oriented protocols where it would be useful to have a library or type that is guaranteed to access memory one octet at a time on all platforms, no matter what CHAR_BIT is equal to on the given platform.

I can definitely understand wanting to have it well specified that something is being used as a sequence of bytes rather than a sequence of characters, but it doesn't seem like being as useful as many other things might be.

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
Graznarak
  • 3,626
  • 4
  • 28
  • 47
  • 11
    One byte has `CHAR_BIT` bits as far as the standard is concerned. That is why the `sizeof(char)` is always 1, even on those DSPs. – NathanOliver Mar 31 '17 at 14:36
  • 4
    Those individuals need to adjust their expectations. Or, to put it more bluntly, stop being wrong. – molbdnilo Mar 31 '17 at 14:38
  • 2
    If your concern is that some people might use it wrong, then that's true of literally everything in C++. Do you have a more specific question? – Barry Mar 31 '17 at 14:45
  • @NathanOliver, so does that mean one `std::byte` has `CHAR_BIT` bits and `sizeof (std::byte)` is always 1? – franji1 Mar 31 '17 at 14:55
  • 5
    http://en.cppreference.com/w/cpp/types/byte, it basically *is* unsigned char, sizeof will be 1. It just has a different type so you can't accidentally use it as a string or integer type, is the impression I get at first glance. – Kenny Ostrom Mar 31 '17 at 14:57
  • @franji1 [As it is defined on cppreference](http://en.cppreference.com/w/cpp/types/byte), yes. – NathanOliver Mar 31 '17 at 15:06
  • @KennyOstrom I read the paper that Herb Sutter referenced on his blog post. Yes, it is/will be essentially an alias to `unsigned char` so it will always have a size of 1. – Graznarak Mar 31 '17 at 15:09
  • It's also worth to notice that memory hardware suppliers always describe the capacity in bits. This gives a higher numerical value but also avoids confusion about the size of a byte. (e.g. http://media.digikey.com/pdf/Data%20Sheets/Micron%20Technology%20Inc%20PDFs/P30%20StrataFlash%20Embedded%20Memory.pdf uses "density of xxx MBits") – harper Apr 04 '17 at 09:06

1 Answers1

13

Given that std::byte is for dealing with "byte-oriented access to memory", and most people understand byte to indicate an octet (not the size of the underlying character type), how will this work for individuals who expect that this will address contiguous 8-bit chunks of memory?

You can't understand something wrong and then expect the world to rearrange itself to fit your expectations.

The reason why most people think a byte and an octet are the same thing is because in most cases it is true. The vast majority of your typical computer has CHAR_BIT == 8. That doesn't mean it is true all the time.

  • A byte is not an octet.
  • char, signed char and unsigned char have a size of one byte.

The good side though is that, people who don't know that, are actually people who don't need to know. If you're working on a machine where a byte is made of more than an octet you are the kind of developer who needs to know that more than any other one.

If we're talking theory here, then the answer is simple: just learn that a byte is different than an octet. If we're talking concrete stuff, then the answer is that you either know the difference already or you won't need to know it (hopefully :)). The worst case is you learning this painfully, but that's the third minority group of developers working on exotic platforms without exotic knowledge.


If you want an equivalent for octets, it already exists:

Note that they are "provided only if the implementation directly supports the type".

Drax
  • 12,682
  • 7
  • 45
  • 85
  • 3
    And yet, those are not guaranteed to work for most purposes that people use. int8_t is not guaranteed to be an alias for signed char (or just char), so you cannot use those to alias data and avoid strict aliasing violations. – Graznarak Mar 31 '17 at 16:14
  • 1
    i agree that the ultimate solution user wise would be something like `std::octet` but this is probably nearly impossible to implement on platforms with a minimum usable size superior to 8bit, (say 16bit). So for portability reasons of the language am not sure you gonna have anything better than this any day, i'd love to be wrong though :) – Drax Mar 31 '17 at 17:34