0
char c[4] = { 'A', '\0', '\0', '\0' };
int* pi = (int*)&c[0];
printf("%x %x %x %x\n", c[0], c[1], c[2], c[3]);
printf("%x %x %x %x\n", *((unsigned char *)pi), *((unsigned char*)pi + 1), *((unsigned char*)pi + 2), ((unsigned char*)pi)[3]);
printf("%d %c\n", (int)c[0], c[0]);
printf("%d %c\n", *pi, (char)*pi);

In the above code, I declared a character-type array and I printed its contents. I can't understand why the integer printing line prints "65 A".

In this case, the memory contents where a character array is pointing to (if I'm not mistaken, they are 4 bytes) are definitely different from the integer 65 because the int type requires 4 bytes and the char type requires a byte. 'A' was not even the last element.

Am I misunderstanding something?

Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
  • 1
    What did you expect instead? "are definitely different from integer 65" What would they be, then? Do you know what *endianess* is? – Gerhardh Dec 13 '22 at 17:10
  • You could improve your question by providing the whole output, not only a line of it. – Gerhardh Dec 13 '22 at 17:11
  • 1
    Take a look to [Is casting byte array to int based on memory alignment safe?](https://stackoverflow.com/questions/33823635/is-casting-byte-array-to-int-based-on-memory-alignment-safe) – David Ranieri Dec 13 '22 at 17:12
  • I think the OP is only concerned with the final `printf` line. – Daniel Walker Dec 13 '22 at 17:12

2 Answers2

0

char c[4] = { 'A', '\0', '\0', '\0' }; arranges for there to be four bytes in memory, of which the first has the value 651, and the remaining three have value 0.

int* pi = (int*)&c[0]; says to make pi point to the first of these bytes.

Because pi is a pointer to an int, *pi is an lvalue for an int. An lvalue is an expression that may designate an object in memory. Because it is for an int, using *pi for its value nominally tells the compiler to get four bytes2 from memory.

The behavior of this is not defined by the C standard, because, although int* pi = (int*)&c[0]; nominally sets pi to point to the first byte of c and *pi nominally gets an int from memory, there are rules about how you may use these things in C, and this program violates the rules. Because of those violations, the behavior of the program is not defined by the C standard.

However, if the program does behave according to the nominal behavior, *pi gets the four bytes 65, 0, 0, and 0 and interprets them as the bytes that represent an int.

Some C implementations store the bytes of an int in memory with the lowest-value byte first in memory (at the lowest address), followed by the second lowest, then third, then fourth. Some store the bytes with the highest-value first, then the second highest, and so on. (It is also allowed to store the bytes in different orders. This is rare.) Your C implementation stores the bytes in the first order, which is called little-endian. In this order, the bytes 65, 0, 0, and 0 represent the value 65. So “65” was printed for *pi.

For (char)*pi), four bytes were fetched from memory and interpreted as an int, yielding the value 65. Then (char) converted this to a char, still with the value 65. This char is actually promoted back to an int to be passed to printf. The conversion specification %c requests that the character with this code, 65, be printed, so “A” was printed.

A proper way to reinterpret bytes as an int is to copy them in using memcpy (or a manual copy using a character type). This method has behavior defined by the C standard:

char c[4] = { 'A', '\0', '\0', '\0' };
int i;
memcpy(&i, c, sizeof i);
printf("i = %d.\n", i);

Footnotes

1 Your C implementation uses ASCII for the character codes 1-127, and the ASCII code for “A” is 65. The C standard does not require that ASCII be used; C implementations may use other character codes.

2 I assume your C implementation uses four bytes for int. The C standard allows some flexibility in this.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Byte-order is a matter of machine architecture rather than C implementation. – Clifford Dec 13 '22 at 20:11
  • @Clifford: Byte order is a matter of C implementation. Per C 2018 6.2.6 2, it is implementation-defined: “Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.” Most C implementations follow the machine architecture (if it only has one; some machines are configurable), for performance. However, implementations for special purposes, such as running old software, may define and implement byte order as they please, and this conforms to the C standard. – Eric Postpischil Dec 13 '22 at 20:49
  • Of course it is implementation defined - by the architecture it is implemented on, not through some arbitrary compiler implementation decision. You said it yourself "some _machines_ are configurable". An implementation that reversed the byte order from the natural order of the machine would generate large and slow code. I don't think the special cases you speak of really prove your point (even if they really exist). What you'd normally do in that case is emulate the target machine architecture and run the compiled code in that emulation - still architecture defined. – Clifford Dec 13 '22 at 22:19
  • @Clifford: A C implementation that reverses the byte order, even if it generates large and slow code, conforms to the C standard (in the absence of other issues). The decision about whether to do that or not belongs to the implementors, regardless of your opinion about it. – Eric Postpischil Dec 13 '22 at 22:22
  • Perhaps I should have proceeded my comment "In practice....". It was not an "opinion". In the context of the question: "_How can a **computer system**...?_" it seemed a useful and relevant clarification. Not intended as a correction or criticism. – Clifford Dec 13 '22 at 22:43
0

The array:

char c[4] = { 'A', '\0', '\0', '\0' };

In memory from low-address to high-address appears as :

0x41 0x00 0x00 0x00

When interpreted as a 32 bit number on a little-endian architecture such as x86, the first byte is the least-significant byte. So the 32-bit integer value is 0x00000041 (or 65).

To construct 0x41000000 you would need to reverse the byte order:

char c[4] = { '\0', '\0', '\0', 'A' };

The line:

printf("%d %c\n", *pi, (char)*pi);

will then print 1090519040 followed by a non-printing NUL character.

More instructively:

printf("%d %08X\n", *pi, *pi);    

will print:

1090519040 0x41000000

and for your original byte-order:

65 0x00000041
Clifford
  • 88,407
  • 13
  • 85
  • 165