5

I'm trying to convert a 2-byte array into a single 16-bit value. For some reason, when I cast the array as a 16-bit pointer and then dereference it, the byte ordering of the value gets swapped.

For example,

#include <stdint.h>
#include <stdio.h>

main()
{
    uint8_t a[2] = {0x15, 0xaa};

    uint16_t b = *(uint16_t*)a;
    printf("%x\n", (unsigned int)b);
    return 0;
}

prints aa15 instead of 15aa (which is what I would expect).

What's the reason behind this, and is there an easy fix?

I'm aware that I can do something like uint16_t b = a[0] << 8 | a[1]; (which does work just fine), but I feel like this problem should be easily solvable with casting and I'm not sure what's causing the issue here.

banks
  • 53
  • 1
  • 5

4 Answers4

7

As mentioned in the comments, this is due to endianness.

Your machine is little-endian, which (among other things) means that multi-byte integer values have the least significant byte first.

If you compiled and ran this code on a big-endian machine (ex. a Sun), you would get the result you expect.

Since your array is set up as big-endian, which also happens to be network byte order, you could get around this by using ntohs and htons. These functions convert a 16-bit value from network byte order (big endian) to the host's byte order and vice versa:

uint16_t b = ntohs(*(uint16_t*)a);

There are similar functions called ntohl and htonl that work on 32-bit values.

dbush
  • 205,898
  • 23
  • 218
  • 273
0

This is because of the endianess of your machine.

In order to make your code independent of the machine consider the following function:

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN    1

int endian() {
    int i = 1;
    char *p = (char *)&i;

    if (p[0] == 1)
        return LITTLE_ENDIAN;
    else
        return BIG_ENDIAN;
}

So for each case you can choose which operation to apply.

Felipe Sulser
  • 1,185
  • 8
  • 19
0

You cannot do anything like *(uint16_t*)a because of the strict aliasing rule. Even if code appears to work for now, it may break later in a different compiler version.

A correct version of the code could be:

b = ((uint16_t)a[0] << CHAR_BIT) + a[1];

The version suggested in your question involving a[0] << 8 is incorrect because on a system with 16-bit int, this may cause signed integer overflow: a[0] promotes to int, and << 8 means * 256.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
-1

This might help to visualize things. When you create the array you have two bytes in order. When you print it you get the human readable hex value which is the opposite of the little endian way it was stored. The value 1 in little endian as a uint16_t type is stored as follows where a0 is a lower address than a1...

 a0       a1
|10000000|00000000

Note, the least significant byte is first, but when we print the value in hex it the least significant byte appears on the right which is what we normally expect on any machine.

This program prints a little endian and big endian 1 in binary starting from least significant byte...

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>

void print_bin(uint64_t num, size_t bytes) {
  int i = 0;
  for(i = bytes * 8; i > 0; i--) {
    (i % 8 == 0) ? printf("|") : 1;
    (num & 1)    ? printf("1") : printf("0");
    num >>= 1;
  }
  printf("\n");
}
int main(void) {
  uint8_t a[2] = {0x15, 0xaa};
  uint16_t b = *(uint16_t*)a;
  uint16_t le = 1;
  uint16_t be = htons(le);

  printf("Little Endian 1\n");
  print_bin(le, 2); 
  printf("Big Endian 1 on little endian machine\n");
  print_bin(be, 2); 
  printf("0xaa15 as little endian\n");
  print_bin(b, 2); 
  return 0;
}

This is the output (this is Least significant byte first)

Little Endian 1
|10000000|00000000
Big Endian 1 on little endian machine
|00000000|10000000
0xaa15 as little endian
|10101000|01010101
Harry
  • 11,298
  • 1
  • 29
  • 43
  • What has `uint64_t` to do with either the question or the 2-byte array? This seems to have over-complicated a simple issue. – Weather Vane Apr 27 '16 at 17:32
  • Sorry, my DV because `1` is never stored as `10000000` – Weather Vane Apr 27 '16 at 17:39
  • I don't understand. I didn't say it was stored as `100000001 ` I used two bytes to demonstrate that it was stored with the first byte appearing to the left. – Harry Apr 27 '16 at 17:40
  • The first byte on the left binary `10000000` is decimal `128`. You are bent on confusing OP further. – Weather Vane Apr 27 '16 at 17:43
  • No, the first byte to the machine has value `1`. If you read it like a human it has value 128 but I'm talking about the byte representation that the machine has in RAM. – Harry Apr 27 '16 at 17:44
  • 1
    Nobody writes bits as little-endian, whatever the byte endianness. – Weather Vane Apr 27 '16 at 17:53
  • @WeatherVane I just updated the answer to display a little endian and big endian `1`. Please explain why the bit is set where it is in big endian format if nobody write it this way? – Harry Apr 27 '16 at 17:56
  • Because bit 7 is decimal `128`, not `1`. Surely with your rep, you know that? – Weather Vane Apr 27 '16 at 17:57
  • My understanding on Intel is it's consistently little endian ie the lowest address of bit 0 == the lowest address of byte 0. – Harry Apr 27 '16 at 22:33
  • @WeatherVane My wife assures me that reputation gained answering questions on stackoverflow is inversely related to intelligence :) – Harry Apr 28 '16 at 03:45