char to integer pointer conversion

Question

void main()
{
 char *s="ABCDEFG";
 clrscr();

 int *ptr=(int *)s;
 printf("%c %d\n",*(ptr+1),*(ptr+1));          //OP :- C 17475
 printf("%c %d\n",*(s+1),*(s+1));              //OP :- B 66

 getch();
}

I know that integer pointer increments by 2 bytes whereas char pointer increments by 1 byte. Here when int pointer increments by 1, only C is printed (only first byte considered). Is it because we have %c specifier ?

Also, I am not able to understand how 17475 is printed as output. In second case 66 is ASCII value of B.

Can someone help me?

Possible duplicate of [Pointer Arithmetic](https://stackoverflow.com/questions/394767/pointer-arithmetic) — Jean-François Fabre, Aug 20 '17 at 06:39
Huh. 17475 is indeed the value that would print for a system using ASCII encoding, little-endian integer storage, `CHAR_BIT==8`, and `sizeof(int)==2` (violating the Standard requirement on minimum `INT_MAX`). Where did you find this compiler? — aschepler, Aug 20 '17 at 06:48
`printf("%c %d\n",*(s+1),*(s+1));` is the same as `printf("%c %d\n", 'B', 'B');` and ASCII encoding of `'B'` is 66. The other `printf` has undefined behavior. — Support Ukraine, Aug 20 '17 at 06:51
Here when int pointer increments by 1, only C is printed (only first byte considered). Is it because we have %c specifier ? — Zephyr, Aug 20 '17 at 06:56
@Zephyr - In principle yes - it is the `%c` specifier that cause the print of C. However, this is undefined behavior and the output can't be explained solely from the C standard. See https://stackoverflow.com/questions/12337574/about-generic-pointer-to-char-and-strict-aliasing BTW - the size of int also differs from system to system so int pointers doesn't also increment by 2. It is more common that they increment by 4. — Support Ukraine, Aug 20 '17 at 07:14
@Zephyr the numbers inside a computer are meaningful only by context. The `'C'` is still `67`. The computer code that encounters `%c` at run-time has no idea how the argument was prepared for it before the function call. The characters `'C'` and `'D'` have hexadecimal values `43` and `44` respectively, in a little-endian machine the 16-bit integer is thus `0x4443` which in decimal is `17475`. But forcing the pointers like that is undefined behaviour, and anything else might have happened. — Weather Vane, Aug 20 '17 at 07:18
@Zephyr - whether we decide to write a number in decimal or hexadecimal or binary is just for convenience, i.e. to type fewer letters, e.g. instead of `10000001` (binary), we write 129 (decimal) or 81 (hexadecimal). Inside the computer everything is binary. — Support Ukraine, Aug 20 '17 at 07:28
I used hexadecimal because decimal would be clumsy: the way a decimal number is "stored" on paper implies that each digit represents a power of 10 without having to say so. But computers do not store their integers in that format - each byte represents powers of 256. In decimal, to explain the byte sequence `67 68` I would need to say `67 + 68 * 256` because the radix of byte storage is 256 not 10. It is more convenient to write it in hex as `0x4443`. — Weather Vane, Aug 20 '17 at 07:36

Support Ukraine · Accepted Answer · 2017-08-20T11:00:41.520

To start with it is important to notice that your code has undefined behavior. That means that we can not say anything about the generated output solely by referring to the C standard. The output may/will differ from system to system and some systems may not even be able to execute the code.

The problem is that you have a number of char (a char array) but you access it using an int pointer. That is not allowed.

However, on a specific system (your system) it is possible to do some consideration about why the output looks as it does. But do remember that it is not valid C code. note: As pointed out by Antti Haapala the code syntax is valid - it's just the behavior of the program which is undefined

The string (aka char array) will be placed somewhere in memory like:

Address |    Bin    | Hex | Dec | Ascii char
--------------------------------------------
 base   | 0100 0001 |  41 | 65  | A
 base+1 | 0100 0010 |  42 | 66  | B
 base+2 | 0100 0011 |  43 | 67  | C
 base+3 | 0100 0100 |  44 | 68  | D
 base+4 | 0100 0101 |  45 | 69  | E
 and so on

Notice that the memory holds binary values. The Hex, Dec, Ascii columns are just a "human" view of the same binary value.

Your pointer s has the value base, i.e. it points to the memory location that holds the value 0100 0001 (aka A).

Then you make ptr point to base as well.

When printing (i.e. printf("%c %d\n",*(ptr+1),*(ptr+1));), the ptr+1 will point to a location that depends on the size of integers (which differs from system to system). Since you have size of int being 2, ptr+1 is the location base + 2, i.e. 0100 0011 (aka C).

So the first part of this statement:

printf("%c %d\n",*(ptr+1),*(ptr+1));
        ^^       ^^^^^^^^

prints a C, i.e. the char at location base+2.

The second part

printf("%c %d\n",*(ptr+1),*(ptr+1));
           ^^             ^^^^^^^^

prints the integer value located at base+2. (note - which is illegal as there is no integer there but let's forget that for a moment).

In your case int is two bytes. So the used bytes will be the C (hex: 0x43) and the D (hex: 0x44). The value printed will depend on the endianness of your system.

Big endian (MSB first) will give:

0x4344 which is 17220 in decimal

Little endian (LSB first) will give:

0x4443 which is 17475 in decimal

So from this it seems your system is little endian.

As you can see a lot of this stuff is very system dependant and from a C standard point of view it is impossible to tell what the out will be.

One thing to note: this very much is valid C code, that follows the syntax of the standard, and might very well be accepted by the OP's compiler. It is just not *strictly-conforming*. — Antti Haapala -- Слава Україні, Aug 20 '17 at 09:12
@AnttiHaapala - True - I have added a note to my answer to address that. Thanks. — Support Ukraine, Aug 20 '17 at 11:01
I don't understand why my question is heavily down voted . Even though it's undefined behaviour I have given the output and just asked for the explanation or was the question too trivial? — Zephyr, Aug 20 '17 at 11:12
@Zephyr - I can't know why it is heavily downvoted. I don't really see the reason. It's complete in the sense that it contains all information but still .... well, I don't know. Sometimes I seems that some users just downvote because they think the question is kind of "too simple" without looking at whether the question is well formulated. Conclusion... I don't know... just ignore it and go on. — Support Ukraine, Aug 20 '17 at 16:37
@AnttiHaapala the terminology "valid program" is normally means one that doesn't violate any syntax nor semantic rules of the standard. The set of programs that have valid syntax only, includes a lot of nonsense, e.g. `int x = "hello";` — M.M, Aug 20 '17 at 21:28
@M.M well yeah, valid meaning follows the syntax and semantic constraints. Having behaviour not defined by C doesn't make the program *invalid*, otherwise pretty much all of the POSIX programs would be invalid. — Antti Haapala -- Слава Україні, Aug 21 '17 at 05:12

J...S · Answer 2 · 2017-08-20T09:02:39.767

When the pointer to integer ptr is incremented, in your case it increments by 2 bytes. ie, ptr+1 points to "CDEFG". When you dereference this location with a character pointer only the first byte C is considered as sizeof(char) is 1.

But when it is treated as integer, 2 bytes will be considered as in your case sizeof(int) is 2.

In your machine the bits are like (because of endianness)

+----+----+----+----+----+----+---+---+---+---+---+---+---+---+---+---+
| 0  | 1  | 0  | 0  | 0  | 1  | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
+----+----+----+----+----+----+---+---+---+---+---+---+---+---+---+---+
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+----+----+----+----+----+----+---+---+---+---+---+---+---+---+---+---+

where the upper row are the bits and the lower row is some index I will use to refer to individual bits.

Bits 0-7 is the binary representation of the character 'C' whose ASCII value is 67. Bits 8-15 is the binary representation of the character `D' whose ASCII value is 68.

ptr+1 points to location starting at bit 0. When the value there is considered as an integer (of size 2 bytes), both bytes will be used. ie, the binary of both 'C' and 'D' are used to find the value of the integer.

And the decimal value of this binary representation is 17475 which is the value that you got.

EDIT: It seems that your code can have undefined behaviour due to strict aliasing rule.

@4386427 How about 'point to location starting at bit 0` instead? — J...S, Aug 20 '17 at 09:03

score 0 · Answer 3 · answered Aug 20 '17 at 07:59

Your code causes undefined behaviour by violating the strict aliasing rule. Any output is meaningless.

You use an expression of int type (namely *(ptr+1)) to access memory. The strict aliasing rule says this this memory must have effective type of int (or a related type such as const int, etc). But the memory actually contains char objects, so all bets are off.

char to integer pointer conversion

3 Answers3

Linked