5

If you have a binary output stream, and write integers to a file on a 32-bit Windows computer. Would you then be able to read the same integers from that same file on a 64-bit Windows computer?

My guess would be no. Since an integer on a 32-bit computer is 4 bytes, where an integer on a 64-bit computer is 8 bytes.

So does the following code work, while the files have to be able to be read and written from and by both 64-bit and 32-bit computers, no matter the OS, computer architecture and data type. If not how would one be able to do that, while the files have to be in binary form.

Writing

std::ofstream ofs("example.bin", std::ios::binary);

int i = 128;
ofs.write((char*) (&i), sizeof(i));

ofs.close();

Reading

std::ifstream ifs("example.bin", std::ios::binary);

int i = 0;
ifs.read((char*) (&i), sizeof(i));

ifs.close();
vallentin
  • 23,478
  • 6
  • 59
  • 81
  • 1
    `sizeof(int) == 4` on both 32-bit and 64-bit Windows systems, as I understand it. It's `long` and pointer types that change. – Joe Z Dec 23 '13 at 00:17
  • @JoeZ Mhm, that's not what I'm reading on [this other](http://stackoverflow.com/questions/14256695/why-does-sizeofint-vary-across-different-operating-systems) Stack Overflow question, concerning `sizeof` – vallentin Dec 23 '13 at 00:20
  • 3
    @Vallentin: If you're using C++11, just include `` and use the correct fixed width integer type (e.g. `int32_t`). – Zeta Dec 23 '13 at 00:21
  • There is an interpretation of a standard and and implementation. It is a lot faster to check the implementation by printing sizeof(int) on both platforms than to look up the internet for the answer. If you print sizeof(int) you will find, as JoeZ says, that it is 4 on both platforms, from VS8 to VS12. – cup Dec 23 '13 at 00:28
  • Vallentin : Going from 16-bit MS-DOS to 32-bit did have a size change for `int`. Going from 32-bit to 64-bit did not. As @Zeta said, you can use the fixed-width types to make the width assumptions clearer in your code. BTW, another thing to watch out for is that 32-bit and 64-bit structure layout likely differs due to different alignment rules. So be careful. – Joe Z Dec 23 '13 at 00:45

2 Answers2

5

While int is 4 bytes on almost all modern platforms (32bit and 64bit), there is no guarantee for its size. So for serializing data into a file or other binary streams, you should prefer fixed width integer types from the header <cstdint> which were introduced in C++11 (some compilers support it in C++03):

#include <cstdint>

...
int32_t i = 128;
ofs.write((char*)(&i), sizeof(i));
...

Another option is to enforce a certain type to have a certain size, e.g. int to have size 4. To make sure your program won't compile if this was not true, use static_assert:

...
int i = 128;
static_assert(sizeof(i) == 4, "Field i has to have size 4.");
ofs.write((char*)(&i), sizeof(i));
...

While this sounds stupid considering we have fixed width integers as above, this might be useful if you want to store a whole struct of which you made assumptions in a certain version of some library. Example: vec4 from glm is documented to contain four floats, so when serializing this struct, it's good to check this statically in order to catch future library changes (unlikely but possible).

Another very important thing to consider however is the endianess of integral types, which varies among platforms. Most compilers for modern x86 desktop platforms use little endian for integral types, so I'd prefer this for your binary file format; but if the platform uses big endian you need to convert it (reverse the byte order).

leemes
  • 44,967
  • 21
  • 135
  • 183
  • How do you control the endianess, of streams? – vallentin Dec 23 '13 at 00:51
  • 1
    Endianess is considered in your part of the code. In your piece of code you posted, you interpret the integer pointer as a char pointer, which is then read as a series of bytes according to how the compiler / platform lays out integral types. So the key is to detect the platform endianess (which I don't know how to do) and to define the endianess of your file format (you can choose either; I recommend little endian), and if they don't match then reverse the bytes using a helper function. This can look similar to `char b[4]; b[3] = *p++; b[2] = *p++; b[1] = *p++; b[0] = *p; ofs.write(b, 4);` – leemes Dec 23 '13 at 00:55
  • See this: http://stackoverflow.com/questions/4239993/determining-endianness-at-compile-time – leemes Dec 23 '13 at 01:00
  • @Vallentin: A good idea is not to. Given `int32_t i`, just write chars `i & 0xFF`, `(i /256) & 0xFF`, `(i/65536) & 0xFF` and `(i/16777216) & 0xFF`. – MSalters Dec 23 '13 at 08:37
  • @MSalters For better readability I'd use bit shifting instead of division. Especially when we expand this to `int64_t` ;) – leemes Dec 23 '13 at 10:11
  • @leemes Taking a look at `int32_t` I found out that it's simply a `typedef int int32_t;`, so I guess that wouldn't actually solve my problem using `int32_t`. – vallentin Dec 23 '13 at 18:40
  • You looked at one implementation of the standard library. So this `typedef` is valid for your compiler / platform. It is different on other systems. As I said: `int` is 4 bytes on *most but not all* systems, and `int32_t` is guaranteed to be 4 bytes, so these systems typically simply use this typedef to define `int32_t`. – leemes Dec 23 '13 at 18:53
1

There's no guarantee for the size of an int in C++. All you know is that it will be at least as big as a short int and no larger than a long int. The compiler is free to choose an appropriate size within these constraints. While most will choose 32-bits as the size of an int, some won't.

If you know your type is always 32-bits then you can use the int32_t type.

include <stdint.h>

to get this type.

StevieB
  • 982
  • 7
  • 15