0

My first question is, why is it customary to use unsigned chars for writing to files in binary mode? In all of the examples I have seen, any other numerical value is casted to unsigned char before writing to the binary file.

My second question is, what's so bad about using stream operators to write to binary files? I've heard that read() and write() operators are best used for writing to binary files, but I don't really understand why that's the case. Using stream operators to write to binary files works fine for me IF I first cast the value to unsigned char.

float num = 500.5;
ostream file("file.txt", ios::binary);

file << num  // results in gibberish when I try to read the file later
file << (unsigned char)num  // no problems reading the file with stream operators

Thanks in advance.

InvalidBrainException
  • 2,312
  • 8
  • 32
  • 41

3 Answers3

3

chars are the smallest type in C/C++ (by definition, sizeof( char ) == 1). Its the usual way to see objects as a sequence of bytes. unsigned is used to avoid signed arithmethic to get in the way, and because it best represents binary contents (a value between 0 and 255).

To operate on binary files, streams provide the read and write functions. The insertion and extraction functionality is formatted. It's working for you just by chance, for instance if you output an integer with << then it will actually output the textual representation of the integer value and not its binary representation. In your provided example, you cast a float to an unsigned char before outputing, actually casting the real value to a small integer. What do you get when you try to read the float back from the file?

K-ballo
  • 80,396
  • 20
  • 159
  • 169
  • `unsigned is used to avoid signed arithmethic to get in the way, and because it best represents binary contents (a value between 0 and 255).` I don't think that matters. – Nawaz Sep 18 '11 at 08:11
  • @Nawaz: What? Avoiding signed arithmetic or getting a value between 0 and 255? Because the later certainly matters to me, and the former prevents bugs like sign-extended conversion to bigger integer types. – K-ballo Sep 18 '11 at 08:15
  • Are you talking about when using `write()`? If so, then it doesn't matter. In fact, using `unsigned` would give compilation error. – Nawaz Sep 18 '11 at 08:17
  • @Nawaz: No, we are talking about whats customary. Using `write` requires a buffer of the stream's element type, which is `char` for `std::ostream`. When I write binary files I use an output stream with an `unsigned char` for element instead. – K-ballo Sep 18 '11 at 08:24
  • @K-ballo: What really clicked for me was your mentioning that sizeof(char) == 1. Of course I knew that, but I never made the connection of how convenient that would be in representing bytes. Thank you. – InvalidBrainException Oct 09 '11 at 12:35
2

Because all the overloads of operator<< are called formatted functions. They format the data before writing to the output file. In other words, they cannot be used if you want to write binary data to file. Binary data can be written to file with unformatted functions - those which don't format the data.

std::ostream provides one unformatted output function called write(), with the following signature:

ostream& write ( const char* s , streamsize n );

which also answers other question that:

why is it customary to use unsigned chars for writing to files in binary mode?

No. It is wrong. The function write() accepts const char*, not const unsigned char *.

--

The online doc says about operator<<:

This operator (<<) applied to an output stream is known as insertion operator. It performs an output operation on a stream generally involving some sort of formatting of the data (like for example writing a numerical value as a sequence of characters).

and it says about write():

This is an unformatted output function and what is written is not necessarily a c-string, therefore any null-character found in the array s is copied to the destination and does not end the writing process.

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • Your first paragraph is very much tautological and circular. You mention that a formatted function is not fit to write binary data since they format the data, and that unformatted functions are fit since they do not format. All of which is self-evident. – Luc Danton Sep 18 '11 at 09:18
  • @Luc: I think you're saying it because of this >> `- those which don't format the data`. It merely clarifies what *unformatted* function means is. One might think that `printf` is a *formatted* function *because* it accepts a *formatted* string of type `const char*` as first argument. This is not what *formatted* means here. – Nawaz Sep 18 '11 at 09:23
  • @Nawaz: Thank you for the explanation. Wish there was a "thank you" button. – InvalidBrainException Oct 09 '11 at 12:36
  • @Terribad: Yes, there is a *thank you* button. Here on stackoverflow, its name is different; its called *Upvote Arrow*. – Nawaz Oct 09 '11 at 13:18
1

The reason to use unsigned char is that it is guaranteed to be unsigned, which is very much desirable when it comes to bitwise operations -- which can come in handy when manipulating binary data. You have to keep in mind that char (also known as plain char) is a separate type from unsigned char and it is not specified whether this is a signed or unsigned type.

Finally, the formatted functions of streams are designed to output/parse a textual, human-readable representation of data, where for instance 123456789 could1 be represented as the nine characters "123456789", which can fit in nine bytes. For comparison, a possible binary representation as 0x75BCD15 can fit in four bytes, which is more than twice as compact.

It is not entirely unexpected that what you're doing succeeds, since whether something is a binary file or not is simply determined by what you're doing with it. If you're writing text to the file, it is normal to retrieve that text back later on.

1: depending on e.g. locales, which is another feature specific to formatted functions.

Community
  • 1
  • 1
Luc Danton
  • 34,649
  • 6
  • 70
  • 114