File and networking portability among different byte sizes

Question

In C, the fread function is like this:

size_t fread(void *buf, size_t max, FILE *file);

Usually char* arrays are used as buf. People usually assume that char = 8 bit. But what if it isn't true? What happens if files written in 8 bit byte systems are read on 10 bit byte systems? Is there any single standard on portability of files and network streams between systems with bytes of different size? And most importantly, how to write portable code in this regard?

Probably 8-bit bytes are just widened up to 10 bits adding zeroes as MSbits, since it's in the best interest of those "strange" systems to be compatible with the rest of the world that uses 8-bit bytes. Also, AFAIK all these systems are "strange" systems (DSPs, old mainframes, ...) that do not usually deal with "regular" files produced by "regular" machines. — Matteo Italia, Dec 22 '12 at 16:33
I just received an answer on other question that these typedefs are only available if machine directly supports them. — lamefun, Dec 22 '12 at 16:40
http://en.cppreference.com/w/cpp/types/integer --- it says "(provided only if the implementation directly supports the type)" — lamefun, Dec 22 '12 at 16:44
@Mellowcandle If `char`s are 10 bits, the implementation _cannot_ provide `uint8_t`. The fixed size types from `stdint.h` must not have padding, and every type needs a multiple of `CHAR_BIT` bits of storage. — Daniel Fischer, Dec 22 '12 at 22:08

score 3 · Answer 1 · answered Dec 22 '12 at 17:44

With regard to network communications, the physical access protocols (like ethernet) define how many bits there go in a "unit of information" and it is up to the implementation to map this to an appropriate type. So, for network communications there is no problem with supporting weird architectures.

For file access, stuff gets more interesting if you want to support weird architectures, because there are no standards to refer to and even the method of putting the files on the system may influence how you can access them. Fortunately, the only systems currently in use that don't support 8-bit bytes are DSP's and similar small embedded systems that don't support a filesystem at all, so the issue is essentially moot.

score 1 · Answer 2 · answered Dec 22 '12 at 16:36

Systems with bit sizes other than 8 is pretty rare these days. But there are machines with other sizes, and files are not guaranteed to be portable to those machines.

If uberportability is required, then you will have to have some sort of encoding in your file that copes with char != 8 bits.

Do you have something in mind where this may have to run on a DEC 10 or really old IBM mainframes, in DSP's or some such, or are you just asking for the purpose of "I want to know". If the latter, I would just "ignore the case". It is pretty special machines that don't have 8-bit characters - and you most likely will have OTHER problems than bits per char to use your "files" on the system then - like how to get the file there in the first place, as you probably can't plug in a USB stick or transfer it with FTP (although the latter is perhaps the most likely one)

File and networking portability among different byte sizes

2 Answers2