Ambiguous Output generated using Pointer dereferencing?

Question

I encountered a following program and was not able to understand how is the output coming to be -109 1683.How is the output coming out to be so?

 #include <stdio.h>

 int main()
{
 int k = 1683;
 char *a = (char *)&k;
 int *l  = &k;
 printf("%d " , *a);
 printf("%d" , *l);
  }
 Output is : -109 1683

How does dereferencing the pointer a give me -109?

I expected it to read first byte of the four byte integer .

1683 in binary representation is 00000000 00000000 00000110 10010011.So reading first byte means output should be 0 1683. What is happening behind the scenes , I heard something concerning to architecture endianness , but could not able to follow it.

Thats because of converting pointer to int to pointer to char. Notice that integer is 4-byte length, and char is 1-byte length. — gravell, Jan 19 '17 at 08:02
The first byte is `10010011` not `00000000` on little endian — Stargateur, Jan 19 '17 at 08:07
Except for the choice of constant (1234 vs 1683), the code in the two questions is functionally equivalent, down to having the same bug of not ending the output with a newline. The basic issue is the same. You're lucky you didn't get a zero from the `char *` variable; had you been working on a big-endian machine, that is what you'd have got. You might also have gotten 146 as the output if the plain `char` type was unsigned — clearly, on your machine, plain `char` is signed. — Jonathan Leffler, Jan 22 '17 at 15:53

Some programmer dude · Answer 1 · 2017-01-19T08:14:28.817

The integer 1683 is equal to 0x00000693 (int is usually 32 bits on modern systems). On a little-endian system (like x86 and x86-64) it's laid out in memory like

+------+------+------+------+------+------+------+------+
| 0x93 | 0x06 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 |
+------+------+------+------+------+------+------+------+

When you initialize the pointer a you make it point to the first byte in that sequence, the byte which contains the value 0x93, so that's the value you get when you dereference a.

So now to the question on how the value 0x93 becomes -109. There are two reasons: One is that char with your compiler is signed (it's compiler dependent if char is a signed or unsigned type). The second reason is because of two's complement arithmetic. Two's complement is a way to be able to encode signed integers on binary machines like modern computers.

Basically, for a single 8-bit byte, you get the negative value by taking the (unsigned) decimal value minus 256 (2⁸). The unsigned decimal value of 0x93 is 147, and 147 - 256 equals -109.

score 3 · Answer 2 · edited May 23 '17 at 12:33

int is at least 16-bit, while char is always 1 byte (not necessary 8-bit). The binary form of number 1683 is 00000110 10010011. Since you're using char*, it will point to the first byte. But then which comes first? Which one will the char refers to? 00000110 or 10010011? It depends:

1) If your system uses little-endian byte order, it will be latter, which is 10010011.

00000110 10010011
         ^^^^^^^^

Since you are using signed char type, the most-significant byte will be used as 'sign-bit', i.e. if it's 1, the byte represents negative number. To get the human-readable value aka base-10 number, you do two-complement. At last you shall get -109.

2) If your system uses big-endian byte order, and it uses 16-bit int, it will be former, which is 00000110. This is easy, the base-10 form of it will be 6.

00000110 10010011
^^^^^^^^

If it uses 32-bit int, it will be zero:

00000000 00000000 00000110 10010011
^^^^^^^^

On big endian machines with 32bit `int` the result would be `0`, not `6`, it would be `6` only with 16 bit `int`. — mch, Jan 19 '17 at 08:55

score 2 · Answer 3 · answered Jan 19 '17 at 08:07

Note that 1683 = 0x693.

If we assume that:

Your HW architecture is Little-Endian
Your platform defines CHAR_BIT as 8

Then the first char in 0x693 is 0x93.

At this point, note that:

In unsigned 2s-complement format: 0x93 = 147
In signed 2s-complement format: 0x93 = 147-256 = -109

S.C. Madsen · Answer 4 · 2017-01-19T08:13:43.303

The types 'char' and 'int' are different sizes on most platforms. Specifically 'int' is often 32 or 64 bits (4 or 8 bytes) and char only 8 bits (1 byte). When de-referencing a 'char' you are asking the program to "interpret" the memory-location of your 'int' as a 'char'.

Additionally the bytes of an 'int' are stored in memory in either little-endian or big-endian (Google it). Thus the results of your program will differ from platform to platform.

If you are running your code on x86 (which is little-endian), you would possibly see the "correct" value, if you set your 'int' to a value less than 128.

update: As you correctly indicate, the least-significant byte of your int is 10010011 which is 147 decimally and thus larger than 10000000 (128 decimal). Since the top-most bit of the byte is set, the value is interpreted as a 2's complement (Google it) negative value of -109.

Suraj Jain · Accepted Answer · 2017-01-19T12:20:09.437

If you have something like ,

int k = 1683 ;
int *l = &k;

If You Dereference Pointer l then it will correctly read integer bytes. Because You declared it to be pointer to int . It will know how many bytes to read by sizeof() operator. Generally size of int is 4 bytes (for 32/64-bit platforms) but it is machine dependent that is why it will use sizeof() operator to know correct size and will read so .Now For Your Code

 int k = 1683;
 char *a = &k;
 int *l  = &k;

Now pointer p points to y but we have declared it to be pointer to a char so it will only read one byte or whatever byte char is of . 1683 in binary would be represented as

00000000 00000000 00000110 10010011

Now if your machine is little endian it will store the bytes reversing them

10010011 00000110 00000000 00000000

10010011 is at address 00 Hypothetical address , 00000110 is at address 01 and so on.

BE:      00   01   02   03
       +----+----+----+----+   
    y: | 00 | 00 | 06 | 93 |
       +----+----+----+----+


LE:      00   01   02   03
       +----+----+----+----+
    y: | 93 | 06 | 00 | 00 |
       +----+----+----+----+

(In Hexadecimal)

So now if you dereference pointer a it will read only first byte and output will be -1 as byte read would be 10010011(Because we pointed signed char ,so the most-significant bit is the sign bit. First bit 1 denotes the sign. 10010011 = –128 + 16 + 2 + 1 = –109.) and if you dereference pointer l it will completely read all bytes of int as we declared it to be pointer to int . And output will be 1234

And Also if you declare pointer l as int *l then *l will read sizeof(int) usually 4 bytes(Depends On Machine Architecture) and *(l+1) will also read that many bytes . Same goes with char or any other data type the pointer pointed to them will read as many bytes there size is of , char is of 1 byte.

score 0 · Answer 6 · answered Jan 19 '17 at 08:08

0

It happens because of byte order: https://en.wikipedia.org/wiki/Endianness

You actually take address of byte 10010011, which is -109.

answered Jan 19 '17 at 08:08

FunkyCat

448
2
6

Please write answer and do not provide reference as the links can be dead anytime . – Jan 19 '17 at 14:14

Ambiguous Output generated using Pointer dereferencing?

6 Answers6