3

Possible Duplicate:
Detecting endianness programmatically in a C++ program

I am working on a C++ project that requires that I know if the system is big endian or little endian.

I have come up with some code that I think would succeed at detecting this! However, this is my first time really programming like this, and I'd like to know whether or not this would actually work:

int fourbytesint = 0;//Initialize four bytes
((char*)&fourbytesint)[0] = 1;//Get the first byte of our four bytes
//(Pretend the int is an array of 4 bytes, get the first byte)

//Depending upon the endian, this will be a reasonably small number, or an unreasonably large number
if (fourbytesint > 1000)
{
    cout << "Big endian!" << endl;
}
else
{
    cout << "Little Endian!" << endl;
}

Also, I was taught by my instructor that char, in c++ can by used to store bytes. I am a little wary of this, as I know in languages like Java, char typically stores two byte Unicode characters.

Am I correct in using char as a byte in the above example?

Community
  • 1
  • 1
Georges Oates Larsen
  • 6,812
  • 12
  • 51
  • 67
  • Are you sure you need to know your endianness? What for? – Kerrek SB Jan 12 '12 at 21:20
  • This has been asked before, numerous times. http://stackoverflow.com/q/1001307/1078151 – That Chuck Guy Jan 12 '12 at 21:21
  • I'm doing an assignment for school, and my instructor has given me the task of writing something that interfaces with his data structure. His data structure is in the form of a raw block of byte data, and due to its nature, relies on big endian/little endian. He has provided two copies of this data, one for big endian systems, and one for little endian systems. I run on mac OSX so it can be difficult to tell which one to use. I simply want my code to choose for me. – Georges Oates Larsen Jan 12 '12 at 21:23
  • @ThatChuckGuy I am more or less asking whether my code specifically would work – Georges Oates Larsen Jan 12 '12 at 21:23
  • 2
    @GeorgesOatesLarsen, if you're on an intel mac (ie, any mac sold in the last few years), use little endian – bdonlan Jan 12 '12 at 21:24
  • If the question is specific to your code would it be better on codereview.stackexchange.com ? – AnnanFay Jan 12 '12 at 21:25
  • @Annan Oh, sorry, I wasn't aware that existed! – Georges Oates Larsen Jan 12 '12 at 21:27
  • btw- this works nicely for a restricted set of two platforms, `1 == htons(1)`, if this is `0`, it's little endian... – Nim Jan 12 '12 at 21:47
  • While this may be a fun exercise note that you can always read and write data *without* knowing your machine implementation details -- all you need to know is the *external* format of the serialized data. – Kerrek SB Jan 12 '12 at 22:44

6 Answers6

3

The easiest way IMHO is to use the htonl() function.

On a big endian machine, the htonl() will be a no-op and your value won't change.

Brian Roach
  • 76,169
  • 12
  • 136
  • 161
  • 1
    However you must be careful to pass a value where all octets are different when testing, to properly detect mixed-endian systems. – bdonlan Jan 12 '12 at 21:23
2

Yes, your method of detection works. (although you may be able to write your code in an endian independent way, for example using htonl() as Brian Roach suggests. This would also make your code work even when big endian and little endian aren't the only options.)

Yes, char in C++ is one byte. By defintion sizeof(char) == 1

Edit: as pointed out in the comments, it's possible that sizeof(char)==sizeof(int). Your code would detect this as big endian. But the concept of endianness doesn't make sense when the storage for a type doesn't span multiple addressable locations. That is, since 'big endian' means that 'sub units' of the int are ordered such that the more significant ones come before the less significant ones, it doesn't make sense to use that term for something that only has one 'sub-unit'. Still, you could write your code for either branch such that it handles this case, and then it won't matter.

bames53
  • 86,085
  • 15
  • 179
  • 244
  • Ahh thank you, you answered all my questions! :) – Georges Oates Larsen Jan 12 '12 at 21:25
  • 1
    while `sizeof(char) == 1` holds by definition, there are absolutely no guarantees that `sizeof(int) != sizeof(char)`, so that doesn't really say anything (I think i've read about a platform where `char` was 64bit) – Grizzly Jan 12 '12 at 21:33
  • 1
    @Grizzly Well `char` is the size of the smallest addressable unit in C++, and that's the definition of a byte. (e.g. there have been systems with 9 bit bytes, and apparently 64 bit bytes if what you say is true). – bames53 Jan 12 '12 at 21:45
  • @bames53: I was just remarking that `sizeof(char)` always beeing one doesn't really help the OPs code – Grizzly Jan 12 '12 at 21:51
  • @bames53, Grizzly is correct that, on some platforms, `sizeof(int) == sizeof(char)`, which means `sizeof(int) == 1` (and that `CHAR_BIT >= 16`). This usually only happens on weird embedded platforms like DSPs, but it does happen, and a library meant to be highly portable has to consider this kind of weirdness. – bdonlan Jan 12 '12 at 22:43
  • @Grizzly Ah yes, you're right that the check could conceivably not work because of that. – bames53 Jan 13 '12 at 00:17
  • Although, come to think of it, endianess doesn't make sense when when sizeof(int)==sizeof(char). – bames53 Jan 13 '12 at 13:48
2

On Mac OS X, your code will work to detect little endian or big endian systems.

Historically, there has been another category - 'mixed endian' systems, where bytes are in a weird order (not 4321 or 1234, but something weirder like 2143 or something). In this case, your code might not detect this - fourbytesint might end up being 256. It's best to use equality checks for code like this. However, since OS X does not run on mixed endian systems, it's not a practical problem for your purposes.

Additionally, on OS X, you would be better off using htonl (from <arpa/inet.h>) for this detection. htonl on many unix-like systems can be optimized away at compile time, removing the runtime overhead from a test like this.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • Most detailed answer yet! I would have you accepted, except a different answer directly answers my two questions. Thank you for the information on mixed endian :) I wish I could star answers – Georges Oates Larsen Jan 12 '12 at 21:37
  • 1
    @GeorgesOatesLarsen you can :) just click the up arrow at the upper left corner of the answer. – davogotland Jan 12 '12 at 22:07
2

I would do the following to check endianness:

bool isLittleEndian()
{
   unsigned num = 0xABCD; 

   return *((unsigned char*)&num) == 0xCD;
}

As others have stated, sizeof(char) does equal 1.

All the C++ standard says about unsigned int is that it's able to hold values from 0 to 0xFFFF (basically at least 2 bytes in size), so that's why I used the value 0xABCD as opposed to one that'd use 4 bytes (could just as well have used 0xCD).

And unsigned char instead of a signed one is better for representing bytes, since you most likely want the raw, unsigned byte value.

AusCBloke
  • 18,014
  • 6
  • 40
  • 44
1

the other way around might be easier. create an int, set it to value 1. then split it into bytes using byte pointer and check if it's the first or the last byte that has the value 1 (the others will be 0).

davogotland
  • 2,718
  • 1
  • 15
  • 19
1

Your code is correct, but it is too complicated.
This is simpler and easier to read.

short var = 0x1;  
char * byte = (char *) &var; 
if(byte[0] > 0){
   cout << "Little Endian";  
}
else{  
   cout << "Big Endian";  
}
Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • I would say our code is around the same in complexity and readability, though using a short as opposed to an int, interesting concept – Georges Oates Larsen Jan 12 '12 at 21:29
  • Depends.Your code may be clear to you and me, but the second line for someone else may be confusing (I certainly avoid it).I always think that the code should be easy to understand by the next developer – Cratylus Jan 12 '12 at 21:32
  • right, this is exactly what i meant ^^ – davogotland Jan 12 '12 at 22:09