Please, do ignore these people who are suggesting that you "write directly to the file". There are a number of issues with that, which ultimately fall into the category of "integer representation". There appear to be some compelling reasons to write integers straight to external storage using fwrite
or what-not, there are some solid facts in play here.
The bottleneck is the external storage controller. Either that, or the network, if you're writing a network application. Thus, writing two bytes as a single fwrite
, or as two distinct fputc
s, should be roughly the same speed, providing your memory profile is adequate for your platform. You can adjust the amount of buffer that your FILE *
s use to a degree using setvbuf
(note: must be a power of two), so we can always fine-tune per platform based on what our profilers tell us, though this information should probably float gracefully upstream to the standard library through gentle suggestions to be useful for other projects, too.
Underlying integer representations are inconsistent between todays computers. Suppose you write unsigned int
s directly to a file using system X which uses 32-bit ints and big endian representation, you'll end up with issues reading that file on system Y which uses 16-bit ints and little endian representation, or system Z which uses 64-bit ints with mixed endian representation and 32 padding bits. Nowadays we have this mix of computers from 15 years ago that people torture themselves with to ARM big.Little SoCs, smartphones and smart TVs, gaming consoles and PCs, all of which have their own quirks which fall outside of the realm of standard C, especially with regards to integer representation, padding and so on.
C was developed with abstractions in mind that allow you to express your algorithm portably, so that you don't have to write different code for each OS! Here's an example of reading and converting four hex digits to an unsigned int
value, portably:
unsigned int value;
int value_is_valid = fscanf(fd, "%04x", &value) == 1;
assert(value_is_valid); // #include <assert.h>
/* NOTE: Actual error correction should occur in place of that
* assertioon
*/
I should point out the reason why I choose %04X
and not %08X
or something more contemporary... if we go by questions asked even today, unfortunately there are students for example using textbooks and compilers that are over 20 years old... Their int
is 16-bit and technically, their compilers are compliant in that aspect (though they really ought to push gcc and llvm throughout academia). With portability in mind, here's how I'd write that value:
value &= 0xFFFF;
fprintf(fd, "%04x", value);
// side-note: We often don't check the return value of `fprintf`, but it can also become \
very important, particularly when dealing with streams and large files...
Supposing your unsigned int
values occupy two bytes, here's how I'd read those two bytes, portably, using big endian representation:
int hi = fgetc(fd);
int lo = fgetc(fd);
unsigned int value = 0;
assert(hi >= 0 && lo >= 0); // again, proper error detection & handling logic should be here
value += hi & 0xFF; value <<= 8;
value += lo & 0xFF;
... and here's how I'd write those two bytes, in their big endian order:
fputc((value >> 8) & 0xFF, fd);
fputc(value & 0xFF, fd);
// and you might also want to check this return value (perhaps in a finely tuned end product)
Perhaps you're more interested in little endian. The neat thing is, the code really isn't that different. Here's input:
int lo = fgetc(fd);
int hi = fgetc(fd);
unsigned int value = 0;
assert(hi >= 0 && lo >= 0);
value += hi & 0xFF; value <<= 8;
value += lo & 0xFF;
... and here's output:
fputc(value & 0xFF, fd);
fputc((value >> 8) & 0xFF, fd);
For anything larger than two bytes (i.e. a long unsigned
or long signed
), you might want to fwrite((char unsigned[]){ value >> 24, value >> 16, value >> 8, value }, 1, 4, fd);
or something for example, to reduce boilerplate. With that in mind, it doesn't seem abusive to form a preprocessor macro:
#define write(fd, ...) fwrite((char unsigned){ __VA_ARGS__ }, 1, sizeof ((char unsigned) { __VA_ARGS__ }), fd)
I suppose one might look at this like choosing the better of two evils: preprocessor abuse or the magic number 4
in the code above, because now we can write(fd, value >> 24, value >> 16, value >> 8, value);
without the 4
being hard-coded... but a word for the uninitiated: side-effects might cause headaches, so don't go causing modifications, writes or global state changes of any kind in arguments of write
.
Well, that's my update to this post for the day... Socially delayed geek person signing out for now.