1

I'm confusing about casting of char to int pointer. I'm checking how pointer's casting works and the below code int to char is working fine.

#include <iostream>
using namespace std;
int main(){
    int a=65;
    void *p=&a;
    cout << *static_cast<char*>(p);
}

Output

A

But when I try to cast from char to int it's not showing correct value.

#include <iostream>
using namespace std;
int main(){
    char a='A';
    void *p=&a;
    cout << *static_cast<int*>(p);

}

What is the problem in the above code? Output is about garbage value.

  • 1
    You can treat every object as an array of chars. But you cannot treat an array of chars as any other object. – Kerrek SB Jul 09 '15 at 13:42
  • A `char` typically isn't the same size as an `int`. If you try to read 4 bytes (or whatever the size of an `int` happens to be on your system) from a variable where you've only defined a single byte (a `char`), you're going to end up with something that looks like garbage. – Michael Jul 09 '15 at 13:42
  • @Michael That actually isn't even relevant due to strict aliasing rules. – Šimon Tóth Jul 09 '15 at 13:45
  • @Let_Me_Be: How so? g++ 4.8.2 happily compiles the erroneous code without any complaints. There may be a rule that says that doing this sort of thing is UB, but it's still possible to compile code like that (with some compilers), and what ends up happening at runtime is likely to be what I mentioned in my previous comment. – Michael Jul 09 '15 at 13:57
  • @Michael If the fact that one compiler compiles your code fine is your measure of correctness then please do not give advice on this site. – Šimon Tóth Jul 09 '15 at 13:59
  • @Let_Me_Be there is a difference between, is this UB? and why is this happening? It is istill mportant to understand how most architectures work. If this was tagged C++11, C++14 or language lawyer, then you can look at standards, this is just tagged C++ – Glenn Teitelbaum Jul 09 '15 at 14:02
  • @Michael the thing is that it could break with every compiler update or switching to a different compiler. Also you can't guarantee it with his information, what if he got an array of 4 char's and the program still shows garbage because the compiler handles it as undefined behaviour? – AliciaBytes Jul 09 '15 at 14:04
  • @Let_Me_Be: That to me is a bit presumptive and hostile. I'm saying that it's entirely possible to do what the OP attempted, without jumping through any hoops. And I was trying to explain the likely reason for what the OP was seeing at runtime. – Michael Jul 09 '15 at 14:05
  • @Michael The problem is that this code is wrong, because it is violating one of the language rules. Not because the sizes of int and char differ. Such answer is misleading. – Šimon Tóth Jul 09 '15 at 14:06
  • @Michael Great, there is no problem in showing why you are seeing a particular result on a particular architecture/platform. But do not lead with that and do not represent it as the "reason for this behavior". – Šimon Tóth Jul 09 '15 at 14:07
  • @Let_Me_Be any issues with my latest edit? – Glenn Teitelbaum Jul 09 '15 at 14:12
  • 2
    I find this reference for C++ http://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing to be helpful in understanding type aliasing and the rules associated with it. And this for C http://en.cppreference.com/w/c/language/object#Strict_aliasing – Niall Jul 09 '15 at 14:14

6 Answers6

6

First, you have to understand that the x86 architecture is what is called little-endian. This means that in multibyte variables, the bytes are ordered in memory from least to most significant. If you don't understand what that means, it'll become clear in a second.

A char is 8 bits -- one byte. When you store 'A' into one, it gets the value 0x41 and is happy. An int is larger; on many architectures it is 32 bits -- 4 bytes. When you assign the value 'A' to an int, it gets the value 0x00000041. This is numerically exactly the same, but there are three extra bytes of zeros in the int.

So your int contains 0x00000041. In memory, that is arranged in bytes, and because you're on a little-endian architecture, those bytes are arranged from least to most significant -- the opposite of how we normally write them! The memory actually looks like this:

      +----+----+----+----+
int:  | 41 | 00 | 00 | 00 |
      +----+----+----+----+
      +----+
char: | 41 |
      +----+

When you take a pointer to the int and cast it to a char*, and then dereference it, the compiler will take the first byte of the int -- because chars are only one byte wide -- and print it out. The other three bytes get ignored! Now look back and notice that if the order of the bytes in the int were reversed, as on a big-endian architecture, you would have retrieved the value zero instead! So the behavior of this code -- the fact that the cast from int* to char* worked as you expected -- was strictly dependent on the machine you were running it on.

On the other hand, when you take a pointer to the char and cast it to an int*, and then defererence it, the compiler will grab the one byte in the char as you'd expect, but then it will also read three more bytes past it, because ints are four bytes wide! What is in those three bytes? You don't know! Your memory looks like this:

      +----+
char: | 41 |
      +----+
      +----+----+----+----+
int:  | 41 | ?? | ?? | ?? |
      +----+----+----+----+

You get a garbage value in your int because you're reading memory that is uninitialized. On a different platform or under a different planetary alignment, your code might work perfectly fine, or it might segfault and crash. There's no telling. This is what is known as undefined behavior, and it is a dangerous game that we play with our compilers. We have to be very careful when working with memory on like this; there's nothing scarier than nondeterministic code.

acwaters
  • 586
  • 3
  • 6
  • Sorry for late response. I just want to ask compiler read binary from left to right or right to left? as you mentioned you are starting to read from right side. I know it's stupid question but just to confirm. –  Jul 11 '15 at 18:05
  • @LetDoit, the examples I gave above all go from lower to higher addresses left-to-right; that is, for an integer on an x86 (little-endian) CPU, the lowest (first) byte is the least significant byte, and the highest (last) byte is the most significant -- which is backwards from how we normally write numbers on paper. So if you imagine all your memory laid out in bytes, with byte 0 on the far left, your integer will take up the four bytes from X to X+3, where the value in byte X is 0x41 and the rest are zeros. – acwaters Jul 12 '15 at 01:10
  • let suppose I have 4 byte(suppose). `1010` and char need to read one byte. what will be the output for char..?? 0 that is on the right side or 1 that is on the left side? –  Jul 12 '15 at 01:19
  • Write the number down in binary or hexadecimal; the `char` ends up with the lowest byte, the one you probably wrote farthest right. You can think of it as applying a 0xFF mask or taking your number modulo 256. The reason this actually happens, assuming you're type punning through pointers as you did in your original question, is because when the computer stores the number, it actually stores the bytes ordered from low to high. – acwaters Jul 12 '15 at 03:46
4

You can safely represent anything as an array of char. It doesn't work the other way. This is part of the STRICT ALIASING rule.

You can read up on strict aliasing in other questions: What is the strict aliasing rule?

More closely related to your question: Once again: strict aliasing rule and char*

Šimon Tóth
  • 35,456
  • 20
  • 106
  • 151
1

Quoting the answer given here: What is the strict aliasing rule?

[...] dereferencing a pointer that aliases another of an incompatible type is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

Also related to your question: Once again: strict aliasing rule and char*

Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of type char). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.

(I must give credit for this second link to @Let_Me_Be)

Francis Moy
  • 185
  • 9
1

According to Standards, casting a char (or multiple chars) to int is undefined behavior and therefore any result is allowed. Most compilers will try to do what makes sense, and so the following is a likely reason for the behavior you are seeing on your specific architecture:

Assuming a 32 bit int, an int is the same size as 4 chars

Different architectures will treat those four bytes differently to translate their value to an int, most commonly this is either little endian or big endian

Looking at:

[Byte1][Byte2][Byte3][Byte4]

The int value would either be:

(Little Endina) Byte1+Byte2*256+Byte3*256^2+Byte4*256^3
(Big Endian   ) Byte4+Byte3*256+Byte2*256^2+Byte1*256^3

In your case either Byte1 or Byte4 is being set, the remaining bytes are whatever happens to be in memory since you are only reserving one byte where you need 4

Try the following:

int main(){
    char a[4]={'A', 0, 0, 0};
    void *p=a;
    cout << *static_cast<int*>(p);    
}

You may have to switch the initialization to {0,0,0, 'A'} to get what you want based on architecture

As noted, this is undefined behavior, but should work with most compilers and give you a better idea of what is going on under the hood

Glenn Teitelbaum
  • 10,108
  • 3
  • 36
  • 80
1

Here when you are doing:

cout << *static_cast<int*>(p);

you are actually saying that p is pointing to an integer (represented by 4 bytes in memory) but you just written a char in it before (represented by 1 bytes in memory) so when you cast it to an integer you expanded your variable to 3 garbage bytes.

But if you cast it back to a char you will get your 'A' because you are slicing your int to a char:

cout << (char) *static_cast<int*>(p);

Otherwise if you just want the ASCII value, cast your void* to an char* (so when you dereference it you are only accessing 1 byte) and cast what is inside it to int.

char a = 'A';
void *p=&a;
cout << static_cast<int>(*((char*)p));

The fact is that static cast is able to understand that you want to cast an char to int (and get his ASCII value) but when asking a char* to int* he just change the number of bytes read when you dereference it.

yayg
  • 309
  • 2
  • 12
-1

Consider following code:

#include <iostream>
#include <iomanip>
using namespace std;
int main(){
  {
    int a=65;
    cout << hex << static_cast<int>(a) << "\n";
    void *p=&a;
    cout << hex << setfill('0') << setw(2 * sizeof(int)) << *static_cast<int*>(p) << "\n";
  }

  {
    char a='A';
    cout << hex << static_cast<int>(a) << "\n";
    void *p=&a;
    cout << hex << *static_cast<int*>(p) << "\n";
  }
}

There is indeed 'A' character code (0x41) in the output, but its padded to the size of int with uninitialized values. You can see it, when you output the hexadecimal values of variables.

kipu44
  • 9
  • 1