0

I'm trying to write a program that detects endianess and returns the endianess type (1 for little, 0 for big) or -1 if none of them. but I encountered this problem: when I try to mask unsigned long word with an unsigned char, and then I try to compare this char to is ASCII value the code inside the if condition becomes unreachable appearrantly...

int is_little_endian() {
    unsigned long word = 0x6600000000000088;
    unsigned char maskedWord = word;
    if (maskedWord == 'X') {
        return 0;
    } else if (maskedWord == 'B') {
        return 1;
    } else return -1;
}

Thanks!

Blur
  • 85
  • 1
  • 5
  • removed my code picture – Blur Nov 19 '17 at 23:34
  • 1
    i.e. **post the actual code**, not pictures of the code – Antti Haapala -- Слава Україні Nov 19 '17 at 23:38
  • Use the implementation-specific macros. Or use compliant code. This one is broken in many aspects: 1) assumes ASCII encoding 2) possibly too large integer constant for `unsigned long`, 3) The 2nd initialiser should generate a conversion warning (always enable all recommended warnings and treat them as errors!), unless `char` has more than 62 bits on your platform. It also does not work for obvious reasons. FInally: Th name implies a boolean result, but you return three results. That's a bad naming scheme or approach. – too honest for this site Nov 20 '17 at 00:54
  • What is your question? I guess you might refer to some compiler warning message ; if so then post the message (and make sure your code is a [MCVE](http://stackoverflow.com/help/mcve)) – M.M Nov 20 '17 at 02:30
  • You should try to stick to a 32-bit value when doing this for portability reasons. Also be aware of the fact that some CPUs have the capability of switching data endianness at runtime (i.e. toggling a CPU flag can switch data from big to little and vice-versa), and some can even store a value in a "middle-endian" byte order when an unaligned 32-bit write is performed in a particular way: [example of middle-endian issue (Twitter)](https://twitter.com/isislovecruft/status/455924593711411200) and [Wikipedia info on middle-endian data format](https://en.wikipedia.org/wiki/Endianness#Middle-endian) –  Nov 20 '17 at 03:39
  • A very clever answer is given at the following link, which uses `int isBigEndian = (htonl(X) == X);` https://stackoverflow.com/questions/1001307/detecting-endianness-programmatically-in-a-c-program/1001330#1001330 – lockcmpxchg8b Nov 21 '17 at 01:44

4 Answers4

2

You can check endianness without using a union, with any type taking at least two bytes

uint32_t val = 1;
int big_endian = !(*(char *)&val);

In memory (for a 4-bytes int), big endian would be

     val:  00 00 00 01
- ============(addresses)======> +

Little endian:

     val:  01 00 00 00
- ============(addresses)======> +
Déjà vu
  • 28,223
  • 6
  • 72
  • 100
  • Some ARM machines can still be a form of "middle-endian" (i.e. `0B 0A 0D 0C` or `0C 0D 0A 0B` to represent `0x0A0B0C0D`) when dealing with unaligned data. This usually won't be a problem, but it's not like there's a guarantee it won't happen. A reply to [this annoyed Twitter post](https://twitter.com/isislovecruft/status/455924593711411200) illustrates how such a thing might happen; the idea is that you perform an unaligned 32-bit write to a 16-bit-aligned address (i.e. `addr % 4 == 2`). Shouldn't happen ordinarily, but it's clearly possible. Some CPUs can switch endianness at runtime too :( –  Nov 20 '17 at 03:31
  • @lockcmpxchg8b this is legal by the standards and no compiler should reject it. – Ajay Brahmakshatriya Nov 21 '17 at 02:35
  • 1
    I stand corrected. It does indeed compile with `gcc -ansi -pedantic -Wall -Werror`. I deleted the comment. (and upvoted this solution) – lockcmpxchg8b Nov 21 '17 at 02:57
1

This code won't successfully test for endianness.

You define unsigned long word = 0x6600000000000088. Ignoring for a moment that this constant might be too large for an unsigned long, when you assign that value to an unsigned char it gets truncated modulo 256, so maskedWord will always equal 0x88.

To do a proper endianness test, you need to create a union of a char array and a fixed size integer and assign bytes to the char array, then check the value of the integer.

union echeck {
    unsigned char bytes[4];
    uint32_t val;
};
echeck e = { .bytes = { 0x01, 0x02, 0x03, 0x04 } };
if (e.val == 0x01020304) {
    printf("big endian\n");
} else if (e.val == 0x04030201) {
    printf("little endian\n");
} else {
    printf("neither big or little endian\n");
}
dbush
  • 205,898
  • 23
  • 218
  • 273
  • The behavior of C is technically undefined when accessing a union member that was not the last member assigned. That said, every compiler I've seen will compile this the way you expect. – lockcmpxchg8b Nov 21 '17 at 02:10
  • @lockcmpxchg8b AFAIK it was defined post C99. But I would rather use dereferencing by casting a pointer to `char*` here. – Ajay Brahmakshatriya Nov 21 '17 at 02:29
0

The code is quite wrong. The result of converting 0x6600000000000088 to unsigned char will result in 0x88 on octet-addressable platforms, be they little or big or middle-endian.

Then another problem is that 'B' is 66 on an ASCII machine, yes, and 'Z' is 88 - but in decimal. But the 66 and 88 are in hex in your program. 0x66 corresponds to 'f' and 88 is some extended character.

Instead of all this mess, use simply

union {
    uint64_t test_value;
    unsigned char bytes[sizeof(uint64_t)];
} detect = { .test_value = 0x0102030405060708 };

and check the values of detector.bytes[0] to detector.bytes[7]

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • we are not allowed to use outside libraries, and the word has to be long... – Blur Nov 20 '17 at 00:39
  • @Blur: `stdint.h` is not an "outside library", but a standard libary which is mandatory for every implementation. If your assignment disallows that, you are learning bad coding style. – too honest for this site Nov 20 '17 at 00:57
  • 1
    There is no need to use the (likely) largest standard type. `uint16_t` would be sufficient. Also use `uint8_t` instead of `unsigned char`. There is no guarantee `unsigned char` has 8 bits and that way a compile error will show if the platform does not support 8 bit types. While this might sound nitpicky, it does not add any extra effort/code and a beginner schould always learn to write good code - the exceptions from this rule will come later once he knows when to bend the rules **and why**.. – too honest for this site Nov 20 '17 at 01:00
  • In general, using a compiler-macro would be the better approach, **iff** such testing is requred at all. Typically one writes the code such that it runs on both endianesses(?!) correctly. (on a sidenote: `sizeof` here requires the parentheses.) – too honest for this site Nov 20 '17 at 01:02
0

Here's a candidate that I think honors the C type-punning rules, presuming that memcpy is okay.

#include <string.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  uint32_t test;
  uint8_t trial[4] = {0x01, 0x23, 0x45, 0x67};

  memcpy(&test, trial, 4);
  switch(test)
  {
    case 0x01234567: printf("Big Endian\n"); break;
    case 0x67452301: printf("Little Endian\n"); break;
    case 0x45670123:
    case 0x23016745: printf("Middle Endian\n"); break;
    default: printf("WTF?\n"); break;
  };
}

I suppose if you have to use long, you could split it into cases for sizeof(long) = 4 and sizeof(long) = 8...

lockcmpxchg8b
  • 2,205
  • 10
  • 16