3

Here is something weird I found:

When I have a char* s of three elements, and assigned it to be "21",

  1. The printed short int value of s appears to be 12594, which is same to 0010001 0010010 in binary, and 49 50 for separate char. But according to the ASCII chart, the value of '2' is 50 and '1' is 49.

  2. when I shift the char to right, *(short*)s >>= 8, the result is agreed with (1.), which is '1' or 49. But after I assigned the char *s = '1', the printed string of s also appears to be "1", which I earlier thought it will become "11".

I am kind of confused about how bits stored in a char now, hope someone can explain this.

Following is the code I use:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
  printf("%lu,%lu\n",sizeof(char), sizeof(short));
  char* s = malloc(sizeof(char)*3);
  *s = '2', *(s+1) = '1', *(s+2) = '\0';
  printf("%s\n",s);
  printf("%d\n",*(short int*)s);
  *(short*)s >>= 8;
  printf("%s\n",s);
  printf("%d\n",*(short int*)s);
  *s = '1';
  printf("%s\n",s);
  return 0;
}

And the output is:

1,2
21
12594
1
49
1

This program is compiled on macOS with gcc.

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
Yi Lin Liu
  • 169
  • 1
  • 11
  • 1
    Are you aware of "endianess"? – Yunnosch Jul 05 '18 at 06:02
  • no, what is "endianness"? @Yunnosch – Yi Lin Liu Jul 05 '18 at 06:03
  • 2
    Or "lttle endian" and "big endian"? – Yunnosch Jul 05 '18 at 06:04
  • to print the result of `sizeof` [use `%zu`](https://stackoverflow.com/q/940087/995714) – phuclv Jul 05 '18 at 06:06
  • Wikipedia: words may be represented in big-endian or little-endian format, depending on whether bits or bytes or other components are ordered from the big end (most significant bit) or the little end (least significant bit). @Yunnosch is this it? – Yi Lin Liu Jul 05 '18 at 06:06
  • Yes that's it. I tried to write it more explicity in the answer. – Yunnosch Jul 05 '18 at 06:12
  • 1
    [What is the strict aliasing rule?](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) – Lundin Jul 05 '18 at 06:25
  • @Lundin Nice link, I used it to improve my answer, let me know if you mind. – Yunnosch Jul 05 '18 at 06:58
  • @Yunnosch One more question, is that possible to know the endianness during compile time? – Yi Lin Liu Jul 06 '18 at 13:12
  • 1
    Yes. Define what kind of endianess you want to detect. Research how it gets visible and test. You can do that in a function and return a bool or an enum "little" or "big". The code you have shown is for example practically such a test. Just return one endianess if you get what you expect and the other if not. Doing that without risking undefined behaviour is however more of a challenge. Maybe the community has an idea. You should however ask for it in a separate question and search for duplicates beforehand. – Yunnosch Jul 06 '18 at 13:16

2 Answers2

4

You need some understanding of the concept of "endianess" here, that values can be represented as "little endian" and "big endian".

I am going to skip the discussion of how legal it is, about involved undefined bahaviour.
(Here is however a relevant link, provided by Lundin, credits:
What is the strict aliasing rule?)

But lets look at a pair of byte in memory, of which the lower-addressed contains a 50 and the higher addressed contains a 49:

50 49

You introduce them exactly this way, by explicitly setting lower byte and higher byte (via char type).

Then you read them, forcing the compiler to consider it a short, which is a two byte sized type on your system.

Compilers and hardware can be created with different "opinions" on what is a good representation of two byte values in two cosecutive bytes. It is called "endianess".

Two compilers, both of which are perfectly standard-conforming can act like this:

The short to be returned is

  • take the value from lower address, multiply it by 256, add the value from higher address
  • take the value from the higher address, multiply it by 256, add the value from the lower address

They do not actually do so, it is a much more efficient mechanism implemented in hardware, but the point is that even the implementation in hardware implicity does this or that.

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
1

You are re-interpreting representations by aliasing types in a way that is not allowed by the standard: you can process a short value as if it were a char array, but not the opposite. Doing that can cause weird errors with optimizing compilers that could assume that the value has never been initialized, or could optimize out a full branch of code that contains Undefined Behaviour.

Then the answer to your question is called endianess. In a big endian representation, the most significant byte has the lowest address (258 or 0x102 will be represented as the 2 byte 0x01, 0x02 in that order) while in little endian representation the least significant byte has the lowest address (0x102 is represented as 0x02, 0x01 in that order).

Your system happens to be a little endian one.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252