3

I am learning type casting of pointers and randomly comes to this program

#include <stdio.h>
main() { 
  char* p="01234567890123456789";
  int *pp = (int *)p;              
  printf("%d",pp[0]);
}

On executing above program , output is 858927408 What are these randome numbers and from where they come ? What's happening in background or in memory ?

Edit : And if i write printf("%c",pp[0]); then output is 0 which is correct but when I change pp[0] to pp[1] then output is 4 but how ?

Golu
  • 350
  • 2
  • 14
  • 2
    Given `char* p="01234567890123456789";`, `int *pp = (int *)p;` [is a strict aliasing violation](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) and undefined behavior. It's also undefined behavior if the original `char` address isn't properly aligned for an `int` value. I'd hope the source of that example makes clear that the code is not really proper. – Andrew Henle Jul 08 '19 at 10:59
  • Yes it is mentioned in example – Golu Jul 08 '19 at 11:16
  • "then output is 0 which is correct" Is it? As you cheat your compiler and provide a data type that does not match the format specifier, you invoke undefined behaviour and it is questionable whether the result qualifies as "correct". – Gerhardh Jul 08 '19 at 12:16

5 Answers5

3

If you express the result in hexadecimal (%x), you can see that:

858927408 = 0x33323130
  • 0x33 is the ascii code for '3'
  • 0x32 is the ascii code for '2'
  • 0x31 is the ascii code for '1'
  • 0x30 is the ascii code for '0'

So you just display the memory storing 0123456... But since your processor is little endian, you see the codes inverted.

In memory, you have (in hexa)

30 31 32 33 34 35 36 37 38   # 0 1 2 3 4 5 6 7 8
39 30 31 32 33 34 35 36 37   # 9 0 1 2 3 4 5 6 7
38 39 00                     # 8 9\0  

In the printf("%d..."), you read the 4 first bytes as a little endian integer, So it display the result of 0x33*0x1000000 + 0x32*0x10000 +0x31*0x100 +0x30


With %c, things are different:

If you write printf("%c", pp[0]), you will try to print ONE character from 0x33323130, so 0x30 is retain (in your case, might be an UB in some cases, I'm not sure) so it display "0" which ascii code is 0x30

If you write printf("%c", pp[1]), you will try to print ONE character from 0x37363534, so 0x34 is retain so it display "4" which ascii code is 0x34

Mathieu
  • 8,840
  • 7
  • 32
  • 45
2
  1. If your C implementation uses ASCII, then the first four bytes of the string "01234567890123456789" are 48, 49, 50, and 51 (hexadecimal 0x30, 0x31, 0x32, and 0x33), which are the ASCII codes for the characters “0”, “1”, “2”, and “3”.
  2. (int *)p converts p from char * to int *. Pointer conversions are not fully defined by the C standard. See the notes below. If there is no alignment problem, in most C implementations, the result of this conversion will point to the same place that p points to.
  3. Having set pp to (int *)p, pp[0] fetches the bytes at pp and interprets them as an int. In your implementation, int objects have four bytes, and bytes are ordered with the least significant byte in the lowest-addressed memory. So the bytes 0x30, 0x31, 0x32, and 0x33 are read from memory and formed into an integer 0x33323130 (decimal 858927408).

Notes About Pointer Conversions and Aliasing

Three things about pointer conversions are relevant here:

  • If the alignment is incorrect, the pointer conversion is not defined by the C standard. In particular, in many C implementations, int objects should be four-byte aligned, whereas char objects may have any alignment. If the address in p is not correctly aligned for an int, then the expression (int *)p could cause the program to crash or could cause undesired results.
  • Even if the alignment is correct, the C standard does not guarantee what the result of converting a general char * to an int * is except that converting the result back to char * will yield the original pointer (or an equivalent pointer). In many C implementations, this conversion will yield a pointer to the same address, just with a different type.
  • The expression pp[0] accesses the bytes at p as if they were an int. This violates a rule in the C standard, called the aliasing rule, that says an object shall have its value accessed only by an expression using a correct type. There are some details about what types are correct, but an int is never a correct type for a char (or for several char). When this rule is violated, the C standard does not define the behavior.

The last point is important because C implementations may or may not support aliasing. Some C implementations support aliasing (meaning they define the behavior even though the C standard does not) because it was widely used, and they want to support existing code that uses it, or because it is needed in certain types of software. Some C implementations do not support aliasing because this allows them to optimize programs better. (If the compiler can assume that an int * never points to a float, when it may be able to avoid reloading float data after assignments through int pointers, since those assignments could not have changed the float data.) Some compilers have switches so you can enable or disabled aliasing support.

Since aliasing can break your program, you should understand the rules for it, avoid it when not needed, and know how to enable it when needed. In this case, aliasing is not needed to examine the results of reinterpreting the bytes of a string as an int. A safe way to do this is to copy the bytes into an int, as with:

char *p = "01234567890123456789";
int i;
memcpy(&i, p, sizeof i);
printf("%d\n", i);
Community
  • 1
  • 1
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

This is the result of ((51×256+50)×256+49)×256+48, where 51 is ASCII code of '3' and 50 is ASCII code of '2' and so on. In fact, pp[0] points to 4 bytes of memory (int is 4 bytes), and those 4 bytes are "0123", and int on your machine is little-endian, so '0' (which is 48 in numeric) is LSB and '3' is MSB.

p[1] is one byte after p[0] because p is a pointer to byte array, but pp[1] is 4 bytes after pp[0] because pp is a pointer to int array and int is 4 bytes.

Ali Tavakol
  • 405
  • 3
  • 11
0

858927408 when converted to hex is 0x33323130

Apperently on your system, you have a little-endian format. In this format the LSB of the integer is stored first.

The first 4 bytes of the string are taken for the integer. "0123" The ascii values are 0x30, 0x31, 0x32, 0x33 respectively. Since this is little-endian. The LSByte of the integer is 0x30 and the MSbyte of the integer is 0x33.

That is how you get 0x33323130 as an output.

Edit Regarding the additional question from OP

And if i write printf("%c",pp[0]); then output is 0 which is correct but when I change pp[0] to pp[1] then output is 4 but how ?

When you have %c in printf and give an integer parameter, you are converting the integer to a character ie, the LS byte is taken 0x30 and this is printed as ASCII.

for pp[1] this is the next integer in the array, which is 4 bytes later. So the LS Byte in this case will be 0x34 and 4 is printed after conversion to ASCII.

Rishikesh Raje
  • 8,556
  • 2
  • 16
  • 31
0

It just sets the start address of the int object at the beginning of the string. The actual value of the int will depend on endianess and sizeof(int).

as "01234567890123456789" is {0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39 ...} in memory if the endianess are little and sizeof(int) == 4 the value will be 0x0x33323130. I the endianess are big the value will be 0x30313233

0___________
  • 60,014
  • 4
  • 34
  • 74