2

This question is about endian's.

Goal is to write 2 bytes in a file for a game on a computer. I want to make sure that people with different computers have the same result whether they use Little- or Big-Endian.

Which of these snippet do I use?

char a[2] = { 0x5c, 0x7B };
fout.write(a, 2);

or

int a = 0x7B5C;
fout.write((char*)&a, 2);

Thanks a bunch.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Cyclone
  • 55
  • 5
  • `fout << 2` ? ASCII works everywhere. (UTF-8 too). – MSalters Mar 30 '20 at 19:28
  • 1
    fwiw, I dont agree with the proposed duplicate because it is only about detecting the endianess not more – 463035818_is_not_an_ai Mar 30 '20 at 19:29
  • 1
    `Goal is to write 2 bytes in a file for a game on a computer` if you want to write `0x5c`, `0x7B` in that order to the file then `char a[2] = { 0x5c, 0x7B };` would be consistent on all systems. But the question is, what do those two bytes represent? If they represent the number `31580` as `int a = 0x7B5C;` indicates, then you still need to convert the number `31580` to the correct byte order. – t.niese Mar 30 '20 at 19:47
  • Neither. You need to decide on a file format (e.g. big-endian or little endian) and convert data to the file format in order to write, and convert back on reading. To do that manually, your program will need to detect host endianness. If you are doing socket programming, you will have access to functions like `htons()` (convert from host to network byte order) and `ntohs()` (the reverse) to do those conversions transparently. Those functions aren't standard C++, but are supported by a number of host platforms which each "know" their endianness. (BTW: network byte order = big endian) – Peter Mar 30 '20 at 19:53
  • If the goal is only to write two bytes, the problem is trivial: just write one byte then the other. If the goal is to write values of various integer types, endianness is only part of the problem. For example, `int` can have different sizes on different platforms; negative numbers can have different representations on different platforms. Standard C and standard C++ binary files are only guaranteed to be readable by the program that wrote them. The simplest approach is to use text files, not binary. – Pete Becker Mar 30 '20 at 20:09

3 Answers3

0

Which of these snippet do I use?

The first one. It has same output regardless of native endianness.

But you'll find that if you need to interpret those bytes as some integer value, that is not so straightforward. char a[2] = { 0x5c, 0x7B } can represent either 0x5c7B (big endian) or 0x7B5c (little endian). So, which one did you intend?

The solution for cross platform interpretation of integers is to decide on particular byte order for the reading and writing. De-facto "standard" for cross platform data is to use big endian.

To write a number in big endian, start by bit-shifting the input value right so that the most significant byte is in the place of the least significant byte. Mask all other bytes (technically redundant in first iteration, but we'll loop back soon). Write this byte to the output. Repeat this for all other bytes in order of significance.

This algorithm produces same output regardless of the native endianness - it will even work on exotic "middle" endian systems if you ever encounter one. Writing to little endian is similar, but in reverse order.

To read a big endian value, read the first byte of input, shift it left so that it goes to the place of most significant byte. Combine the shifted byte with the result (initially zero) using bitwise-or. Repeat with the next byte by shifting to the second most significant place and so on.

to know the Endianess of a computer?

To know endianness of a system, you can use std::endian in the upcoming C++20. Prior to that, you can use implementation specific macros from endian.h header. Or you can do a simple calculation like you suggest.

But you never really need to know the endianness of a system. You can simply use the algorithms that I described, which work on systems of all endianness without having to know what that endianness is.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Why would `char a[2] = { 0x5c, 0x7B }; fout.write(a, 2);` alone be inconstient between different system? – t.niese Mar 30 '20 at 19:48
  • @t.niese whoops. I did not read the question properly. Answer fixed. – eerorika Mar 30 '20 at 19:53
  • So... I should use this, and it will be the same on every computer? ` char a[2] = { 0x5c, 0x7B }; fout.write(a, 2); And to reverse the byte order I just reverse the contents of the char a[2] array? Also use this function to know the Endianess of a computer? int num = 1; if(*(char *)&num == 1) { printf("\nLittle-Endian\n"); } else { printf("Big-Endian\n"); } Thanks again. SORRY for the unformatted text... – Cyclone Mar 30 '20 at 21:47
  • @Cyclone See the edit. – eerorika Mar 31 '20 at 05:28
  • hmm. would you be so kind as to show an example of bit-shifting? And to clarify... After I check the the machines Endianess. using this... #include using namespace std; int main() { int num = 1; if (static_cast(num) == 1) { cout << "Little-Endian"; } else { cout << "Big-Endian"; } return 0; } I just write the bytes individually in order of it's Endianess? – Cyclone Mar 31 '20 at 07:18
  • @Cyclone You don't need to check the machines endianness if you use the algorithm that I describe. Bit shifting is done with the operators `<<` and `>>` – eerorika Mar 31 '20 at 07:21
0

From wikipedia:

In its most common usage, endianness indicates the ordering of bytes within a multi-byte number.

So for char a[2] = { 0x5c, 0x7B };, a[1] will be always 0x7B

However, for int a = 0x7B5C;, char* oneByte = (char*)&a; (char *)oneByte[0]; may be 0x7B or 0x5C, but as you can see, you have to play with casts and byte pointers (bear in mind the undefined behaviour when char[1], it is only for explanation purposes).

Jose
  • 3,306
  • 1
  • 17
  • 22
0

One way that is used quite often is to write a 'signature' or 'magic' number as the first data in the file - typically a 16-bit integer whose value, when read back, will depend on whether or not the reading platform has the same endianness as the writing platform. If you then detect a mismatch, all data (of more than one byte) read from the file will need to be byte swapped.

Here's some outline code:

void ByteSwap(void *buffer, size_t length)
{
    unsigned char *p = static_cast<unsigned char *>(buffer);
    for (size_t i = 0; i < length / 2; ++i) {
        unsigned char tmp = *(p + i);
        *(p + i) = *(p + length - i - 1);
        *(p + length - i - 1) = tmp;
    }
    return;
}

bool WriteData(void *data, size_t size, size_t num, FILE *file)
{
    uint16_t magic = 0xAB12; // Something that can be tested for byte-reversal
    if (fwrite(&magic, sizeof(uint16_t), 1, file) != 1) return false;
    if (fwrite(data, size, num, file) != num) return false;
    return true;
}

bool ReadData(void *data, size_t size, size_t num, FILE *file)
{
    uint16_t test_magic;
    bool is_reversed;
    if (fread(&test_magic, sizeof(uint16_t), 1, file) != 1) return false;
    if (test_magic == 0xAB12) is_reversed = false;
    else if (test_magic == 0x12AB) is_reversed = true;
    else return false; // Error - needs handling!
    if (fread(data, size, num, file) != num) return false;
    if (is_reversed && (size > 1)) {
        for (size_t i = 0; i < num; ++i) ByteSwap(static_cast<char *>(data) + (i*size), size);
    }
    return true;
}

Of course, in the real world, you wouldn't need to write/read the 'magic' number for every input/output operation - just once per file, and store the is_reversed flag for future use when reading data back.

Also, with proper use of C++ code, you would probably be using std::stream arguments, rather than the FILE* I have shown - but the sample I have posted has been extracted (with only very little modification) from code that I actually use in my projects (to do just this test). But conversion to better use of modern C++ should be straightforward.

Feel free to ask for further clarification and/or explanation.

NOTE: The ByteSwap function I have provided is not ideal! It almost certainly breaks strict aliasing rules and may well cause undefined behaviour on some platforms, if used carelessly. Also, it is not the most efficient method for small data units (like int variables). One could (and should) provide one's own byte-reversal function(s) to handle specific types of variables - a good case for overloading the function with different argument types).

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83