1

I have a variable of type uint8_t which I'd like to serialize and write to a file (which should be quite portable, at least for Windows, which is what I'm aiming at).

Trying to write it to a file in its binary form, I came accross this working snippet:

uint8_t m_num = 3;
unsigned int s = (unsigned int)(m_num & 0xFF);
file.write((wchar_t*)&s, 1); // file = std::wofstream

First, let me make sure I understand what this snippet does - it takes my var (which is basically an unsigned char, 1 byte long), converts it into an unsigned int (which is 4 bytes long, and not so portable), and using & 0xFF "extracts" only the least significant byte.

Now, there are two things I don't understand:

  1. Why convert it into unsigned int in the first place, why can't I simply do something like
    file.write((wchar_t*)&m_num, 1); or reinterpret_cast<wchar_t *>(&m_num)? (Ref)
  2. How would I serialize a longer type, say a uint64_t (which is 8 bytes long)? unsigned int may or may not be enough here.
Community
  • 1
  • 1
Asaf
  • 2,005
  • 7
  • 37
  • 59
  • 2
    That code is horrible, not portable, and has undefined behaviour. It's also targeting a stream of wide characters which you probably aren't. – Alan Stokes Jun 11 '16 at 08:23
  • Hi @AlanStokes, thank you for your comment. Could you please elaborate why this code is bad and not portable? What would be a wiser way to do that? – Asaf Jun 11 '16 at 08:27

2 Answers2

1

uint8_t is 1 byte, same as char

wchar_t is 2 bytes in Windows, 4 bytes in Linux. It is also depends on endianness. You should avoid wchar_t if portability is a concern.

You can just use std::ofstream. Windows has an additional version for std::ofstream which accepts UTF16 file name. This way your code is compatible with Windows UTF16 filenames and you can still use std::fstream. For example

int i = 123;
std::ofstream file(L"filename_in_unicode.bin", std::ios::binary);
file.write((char*)&i, sizeof(i)); //sizeof(int) is 4
file.close();
...
std::ifstream fin(L"filename_in_unicode.bin", std::ios::binary);
fin.read((char*)&i, 4); // output: i = 123

This is relatively simple because it's only storing integers. This will work on different Windows systems, because Windows is always little-endian, and int size is always 4.

But some systems are big-endian, you would have to deal with that separately.

If you use standard I/O, for example fout << 123456 then integer will be stored as text "123456". Standard I/O is compatible, but it takes a little more disk space and can be a little slower.

It's compatibility versus performance. If you have large amounts of data (several mega bytes or more) and you can deal with compatibility issues in future, then go ahead with writing bytes. Otherwise it's easier to use standard I/O. The performance difference is usually not measurable.

Barmak Shemirani
  • 30,904
  • 6
  • 40
  • 77
  • Hi @BarmakShemirani, thank you for your answer! It does make a lot of sense now! Obviously char is much more portable I guess. Could you please explain why it's better to use standard i/o? and what other portability issues are present? – Asaf Jun 11 '16 at 12:09
  • I added more explanation in the answer. There is also an issue with text. If you want compatibility with other systems it is common practice to convert UTF16 to UTF8. I don't know if you are including text in your file, I didn't get in to that. – Barmak Shemirani Jun 11 '16 at 16:08
0

It is impossible to write unit8_t values to a wofstream because a wofstream only writes wide characters and doesn't handle binary values at all.

If what you want to do is to write a wide character representing a code point between 0 and 255, then your code is correct.

If you want to write binary data to a file then your nearest equivalent is ofstream, which will allow you to write bytes.

To answer your questions:

  1. wofstream::write writes wide characters, not bytes. If you reinterpret the address of m_num as the address of a wide character, you will be writing a 16-bit or 32-bit (depending on platform) wide character of which the first byte (that is, the least significant or most significant, depending on platform) is the value of m_num and the remaining bytes are whatever happens to occur in memory after m_num. Depending on the character encoding of the wide characters, this may not even be a valid character. Even if valid, it is largely nonsense. (There are other possible problems if wofstream::write expects a wide-character-aligned rather than a byte-aligned input, or if m_num is immediately followed by unreadable memory).

  2. If you use wofstream then this is a mess, and I shan't address it. If you switch to a byte-oriented ofstream then you have two choices. 1. If you will only ever be reading the file on the same system, file.write(&myint64value,sizeof(myint64value)) will work. The sequence in which the bytes of the 64-bit value are written will be undefined, but the same sequence will be used when you read back, so this doesn't matter. Don't try do something analogous with wofstream because it's dangerous! 2. Extract each of the 8 bytes of myint64value separately (shift right by a multiple of 8 bits and then take the bottom 8 bits) and then write it. This is fully portable because you control the order in which the bytes are written.

nugae
  • 499
  • 2
  • 5
  • Thanks @nugae! About point #2, the problem is endianness , is that right? using function like `htons`, `htonl` and friends (basically set the standard to big-endian) would solve the problem, am I wrong? – Asaf Jun 11 '16 at 12:22
  • Yes, it is endianness. As long as you stay within one system, it doesn't matter, but if you want inter-system compatibility then it does. `htonl` and its relatives would work but (according to the documentation) they only go up as far as `uint32_t`. So if you wanted to do `uint64_t` then you would have to do the bottom half (`&0xffffffffU`) and the top half (`>>32`) separately. You could pack that into your own `htonl64` function, or (better) into your own `write64` and `read64` functions. – nugae Jun 12 '16 at 11:34