1

When trying to compile this code:

std::fstream file("file.name", std::ios::out | std::ios::binary);
uint8_t buf[BUFSIZE];
//Fill the buffer, etc...
file.write(buf, BUFSIZE);

compiler will give me warning about oh-not-so-healthy conversion from unsigned char to char in call to write(). As std::fstream is in fact just a typedef for std::basic_fstream<char>, one could think that using std::basic_fstream<uint8_t> instead would allow them to compile above code without warning, as write() expects pointer of template type.

This works, of course, but another problem pops out. Even though this code compiles perfectly fine:

std::basic_fstream<uint8_t> file("file.name", std::ios::out | std::ios::binary);
uint8_t buf[BUFSIZE];
//Fill the buffer, etc...
file.write(buf, BUFSIZE);

it will now fail on call to write(), even though previous version was working (disregard compiler warnings). It took me a while to pinpoint where exception is thrown from in standard C++ library code, but I still don't really understand what's the case here. It looks like std::basic_fstream uses a few character coding mechanism, and since there is one defined for char but none for unsigned char, the file stream fails silently when trying to use "wrong" character data type... That's how I see it, at least.

But that's also what I don't understand. There is no need for any character encoding. I don't even open file in text mode, I want to deal with binary data. That's why I use arrays of type uint8_t, not char, it feels more natural to use this data type rather than plain old char. But before I either decide to give up on uint8_t data type and just accept working with char buffers, or start using arrays of custom byte datatype defined as char, I'd like to ask two questions:

  1. What exactly is that mechanism that stops me from using unsigned character datatype? Is it really something related to character encoding, or does it serve some other purpose? Why file stream works fine with signed character data types, but not for unsigned ones?
  2. Assuming that I still would want to use std::basic_fstream<uint8_t>, regardless how (un)reasonable it is - is there any way to achieve that?
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
PookyFan
  • 785
  • 1
  • 8
  • 23
  • 3
    The stream's internals simply don't support `unsigned char`. Just let it use `char` normaly, you will just have to perform a type-cast on `write()`, eg: `std::ofstream file("file.name", std::ios::binary); ... file.write(reinterpret_cast(buf), BUFSIZE);` – Remy Lebeau Nov 17 '20 at 22:53
  • I know that, but it's sooooo ugly. – PookyFan Nov 17 '20 at 23:03
  • Don't worry `reinterpret_cast`, mother still loves you. – user4581301 Nov 17 '20 at 23:04
  • @PookyFan ugly or not, it is what you have to do, though – Remy Lebeau Nov 17 '20 at 23:09
  • Yeah, either that or just use `char` arrays. But this entire question came from seeking an alternative solution, as neither really appeals to me. But if I were to choose from the two, I think I'd rather use compatible arrays than cast the pointer just for writting the data, especially if there would be more `write()` calls in code. – PookyFan Nov 17 '20 at 23:12

2 Answers2

4

std::basic_fstream<unsigned char> doesn't work because it uses std::char_traits<unsigned char> but the standard library doesn't provides such a specialisation, see std::char_traits for full details.

If you'd like to read/write binary data, you need to use std::basic_fstream<char>, open it with std::ios_base::binary flag and use std::basic_ostream<CharT,Traits>::write function to write binary data.

That's a bit of legacy since all char types can be used to represent binary data. The standard library uses char probably because that's the shortest one to type and read that does the job.


What exactly is that mechanism that stops me from using unsigned character datatype?

No std::char_traits<unsigned char> specialization.

Is it really something related to character encoding, or does it serve some other purpose?

std::char_traits has a few purposes exactly defined in its interface but that doesn't include decoding/encoding. The latter is done by codecvt, see the usage example there.

Why file stream works fine with signed character data types, but not for unsigned ones?

Because std::basic_ostream<CharT,Traits>::write accepts CharT, the first template parameter you specify for the stream. It writes the same character type it reads and it uses that codecvt to convert from CharT to bytes.

Assuming that I still would want to use std::basic_fstream<uint8_t>, regardless how (un)reasonable it is - is there any way to achieve that?

The standard class and function templates cannot be specialized for built-in types, if I am not mistaken. You'd need to create another class with std::char_traits interface and specify that as the second template argument for the standard streams. I guess, you would need a pretty strong (philosophical) reason to roll up your sleeves and do that.

If you don't, you may like to keep using std::fstream<char> and do stream.write(reinterpret_cast<char const*>(buf), sizeof buf);.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • All right, but I still want to know the answer for second questions, as it bugs me a lot. Even if there is no `std::char_traits` defined, can I somehow define it myself and "feed" file stream object with it (without complicating my code too much, that is). If the answer is "no" (either because std class design doesn't allow it or would require me to create subclass just for that purpose), then so be it - I can live with working with binary data in form of `char`s. But just out of the curiosity I'd like to know the way to make my original solution working, if there is any. – PookyFan Nov 17 '20 at 23:02
  • @PookyFan I think you updated the question, let me re-read and update my answer. – Maxim Egorushkin Nov 17 '20 at 23:03
  • 1
    Well, that's a very elaborated edit, very informative. Thanks for that, I'm going to upvote and accept your answer. – PookyFan Nov 17 '20 at 23:49
  • @PookyFan Thank you for your kind words, that is something I cannot not reward. – Maxim Egorushkin Nov 17 '20 at 23:59
  • 1
    @PookyFan To be more pedantic, you'd have to **at least** create another `std::char_traits` **and** another `std::codecvt` for `unsigned char`. – Maxim Egorushkin Nov 18 '20 at 00:09
0

Actually char and uint8_t can be different types. What this means also is they can have different std::char_traits. The character traits type is the second template parameter of std::basic_fstream, which by default is std::char_traits instantiated with the character type. std::basic_fstream does formatted I/O by default via the character traits template parameter. It does not simply redirect raw bytes unchanged. This may be why you are getting different results.

Anonymous1847
  • 2,568
  • 10
  • 16
  • `(u)int8_t` as a distinct type is *optional*, it *could* just be an alias for `(unsigned) char` instead. That is for the compiler vendor to decide. – Remy Lebeau Nov 17 '20 at 22:55
  • @RemyLebeau Yes, what I meant is they are not necessarily the same. – Anonymous1847 Nov 17 '20 at 22:55
  • But your are quite right that `signed char`, `unsigned char` and `char` are distinct types. `uint8_t` is required to be unsigned, however, signedness of `char` depends on the target architecture and `gcc` has a command line option for the signedness of `char`, so that `uint8_t` should explicitly be defined as `unsigned char` to be universally robust with no nasty surprises (do not break the fundamental engineering principle of _least surprise_). Having `uint8_t` defined as `char` or `unsigned char` on differernt architectures would be a recipe for difficult bugs. – Maxim Egorushkin Nov 18 '20 at 00:18