c++ normalizing data sizes across systems

Question

I have a struct with three variables: two unsigned ints and an unsigned char. From my understanding, a c++ char is always 1 byte regardless of what operating system it is on. The same can't be said for other datatypes. I am looking for a way to normalize POD's so that when saved into a binary file, the resulting file is readable on any operating system that the code is compiled for.

I changed my struct to use a 1-byte alignment by adding #pragma as follows:

#pragma pack(push, 1) 
struct test
{
   int a;
}
#pragma pack(pop)

but that doesn't necessarily mean that int a is exactly 4 bytes on every os, I don't think? Is there a way to ensure that a file saved from my code will always be readable?

Don't they have types like `int32` in certain libraries for this purpose? — mpen, Dec 28 '10 at 18:48
In C++, it is guaranteed that `sizeof(char) == 1`. The number of bits is not guaranteed, and there's nothing except tradition to stop somebody from making an implementation where a `char` is two bytes. — David Thornley, Dec 28 '10 at 18:52
@David Thomley: it's more quirky. ISO C guarantees that a char has 1 byte. However, "byte" is the unit of storage that C operates on (i.e. what sizeof and malloc count in), and (as you say) there is no guarantee on the number of bits; some implementation may have bytes with 64 bits, for example. The implementation defines CHAR_BIT for that. — Martin v. Löwis, Dec 28 '10 at 18:56
Related: http://stackoverflow.com/questions/4329777/is-long-guaranteed-to-be-at-least-32-bits — John Dibling, Dec 28 '10 at 19:03
@David: A `char` is *always* exactly one byte. What may change is how many bits are in a byte, not how many bytes a `char` is. — John Dibling, Dec 28 '10 at 19:05
Its easier to use a human readable format. Its easy to build easy to read and easy to modify by hand when debugging. binary formats are rarely worth the extra effort — Martin York, Dec 28 '10 at 19:27

score 3 · Accepted Answer · answered Dec 28 '10 at 18:49

3

You can find fixed-width integer types (like std::int32_t and std::uint16_t) in <cstdint>. Your C++ Standard Library implementation may not include <cstdint> (it's not part of the current C++ standard; it's part of C++0x), in which case Boost has an implementation that should work on most platforms.

Note that you will still have to think about endianness and alignment, among other things. If your code needs to run on platforms with different numeric representations (e.g. one's complement and two's complement), you'll need to consider that too.

answered Dec 28 '10 at 18:49

James McNellis

348,265
75
913
977

I'm mostly concerned with it being portable between 32-bit windows, linux, mac, and 64-bit windows, linux, mac. I don't think I'm going to be compiling to anything that would differ that greatly. – Bocochoco Dec 28 '10 at 18:52
Good point :) He's probably better off using a text-based save file if he wants cross-platformness, methinks. – mpen Dec 28 '10 at 18:53
Why incur the penalty of format conversion when a simple byte swap will fix it. http://en.wikipedia.org/wiki/Swab_%28programming%29 – Jay Dec 28 '10 at 19:17
The C99 fixed-width signed types are guaranteed to be two's complement. If your code needs to run on platforms with different numeric representations, you have to consider that those platforms probably will not provide `int32_t`. – Steve Jessop Dec 28 '10 at 20:24

score 0 · Answer 2 · answered Dec 28 '10 at 19:00

0

If you are concerned with 32-bit Windows, 64-bit Windows, Linux (x86 and AMD64) and Mac (x86, AMD64, PPC), then it's more easy. An int will always be 32 bits on all of these systems. If you can allow to drop PPC, it will also be little-endian always. If you need to support big-endian systems, I recommend to store data in network byte order, using ntohl/htonl.

answered Dec 28 '10 at 19:00

Martin v. Löwis

124,830
17
198
235

I'm doing my best to use just POD types. It's harder than I had anticipated. – Bocochoco Dec 28 '10 at 19:12

score 0 · Answer 3 · answered Dec 28 '10 at 19:00

There's no way to just write a binary struct out like that and have it readable by any system. While you can use some library that defines types like int32, that doesn't solve your problem.

Different processors use different byte orders, and may require different alignment. Further, padding is implementation-dependent. Fortunately, all current processors I'm aware of use twos-complement for integer representation rather than ones-complement or sign-magnitude, so integers at least have the same binary representation (modulo byte order).

No #pragma can be a truly portable solution, since they're by definition implementation-defined, and you can't be certain that different compilers will treat them the same. There are some more specifiers being worked on for the next C++ standard, but they're not going to be all that common for some time.

What you are going to have to do is specify the struct with something like int32, and then break it down into a stream of bytes and build it back up again on the other end. Look up "serialization".

So writing the struct directly to the steam won't work? As in, (a vector of structs in this case): `of.write((char*)&col[0], sizeof(struct));` It's writes and reads back in just fine now, but I haven't had the chance to test it on other machines. — Bocochoco, Dec 28 '10 at 19:13
@Bocochoco: That will work just fine for one compiler on one OS on one computer, but when you start writing and reading on different systems you may find cases where it doesn't work. If you're using an x86-based and a PPC-based system (like an older Mac), for example, it will definitely fail. — David Thornley, Dec 28 '10 at 20:13

score 0 · Answer 4 · answered Dec 28 '10 at 22:14

The proper way to do this is to serialise the data in a standard format. There are many standards out there for doing this. For simplicity CSV is one (comma separated variables). If you want a more efficient standard, try XDR, or, one used in the telco industry a lot, ASN.1.

c++ normalizing data sizes across systems

4 Answers4