Why does the returned value of bit operations changes every time?

Question

I am a beginner of computer science. I have learned that a pointer is a compound data type, which indicates the address of a data and the data's type and size. Type conversion of a pointer only change the read size but the start address. To confirm it, I made an experiment.

See the code below. I changed the pointer type of variable 'sample', but I think it still point the first byte of sample. Nothing changed but the size. Then I make the (char type) pointer jump left a byte(That is "p = p-1" in the code). After that, I convert it back to a short type. I think the pointed data is 0x..24(.. means data in front of 0x2456). Finally, I use bit operation "<<" to change to 0x2400. However, I got random numbers every time I run it.

   #include<stdio.h>

   int main(void){
       short sample = 0x2456;

       char *p = (char*) &sample; 
       p = p-1;
       printf("%d\n",*((short*)p)<<8 );
       return 0;
   }

is `p = p-1;` supposed to subtract 1 from 0x2456? 'cause it doesn't. It shifts the pointer to some other memory location. — , May 19 '20 at 05:05
Wait are you expecting to get 0x5656 or 0x2424 or something like that? — , May 19 '20 at 05:40
That's simple. Simply say `*p='\0'` . As it points to the lowest part of your short, it zeroes it and then try printing `sample` . — , May 19 '20 at 07:03
Well, the memory is laid out such that you have to jump right `p=p+1` to get to 0x24. — , May 19 '20 at 07:25
@Pranavappu Thanks for your reply, now I understand the reason. — XM Zg, May 19 '20 at 07:31

Basile Starynkevitch · Answer 1 · 2020-05-19T07:55:38.940

Most operating systems are providing address space layout randomization (for cybersecurity reasons), including for the call stack of your main function called from crt0.

I don't know why I can't get a fixed value.

ASLR might explain why running your program several times on your OS produces different output. Your p probably points to some weird location of the call stack.

Read of course more about undefined behavior, and also Modern C then the specification of C11, that is n1570.

If you use a recent GCC to compile your C code foo.c, consider compiling it with gcc -Wall -Wextra -O -S foo.c then look into the emitted assembler code foo.s. You'll then understand what value is passed to printf. It is implementation specific.

score 0 · Accepted Answer · 2020-05-19T08:54:54.677

0

From your code, I am assuming you want to shift 0x24 to the higher(actually lower) byte. Try p=p+1 and see if you get the desired results.

Instead of reading the pointer as short, if you do (int)*p << 8 , you get 2400 every time.

Or you can do something crazy like initialising some variable after initializing pointer so that when we shift the pointer, it won't get the garbage but part of the variable

#include<stdio.h>

   int main(void){
       short sample = 0x2456;

       char *p = (char*) &sample; 
       int zero = 0;
       p = p+1;
       printf("%x\n",*((short*)p) << 8 ); 
       return 0;
   }

You can even print 0x1224(half of both) like this

   #include<stdio.h>

   int main(void){
       short sample = 0x2456;

       char *p = (char*) &sample; 
       int zero = 0x12;
       p = p+1;
       printf("%x\n",*((short*)p);    //will print 1224
       return 0;
   }

sidenote: I assumed both variables to be in a single stack frame and assumed memory is little endian. Results may subject to change with compilers and target systems.

edited May 19 '20 at 08:54

answered May 19 '20 at 05:45

"if you do (int)*p << 8" If you do, you get strict aliasing violations, unless you get a misaligned access trap first. – Lundin May 19 '20 at 08:33
2

As for your examples, they make wild assumptions about a certain stack memory layout and endianess. You ought to mention that to the OP, so they don't end up thinking this code has some deterministic or expected result. – Lundin May 19 '20 at 08:35
@Lundin I tried `(int)*p <<8` and worked, that's why I posted it inside. Could you explain more about aliasing violation and misaligned trap? As for the 2nd comment, you're right and I'll put it as a sidenote. – May 19 '20 at 08:51
Strict aliasing FAQ here: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule – Lundin May 19 '20 at 08:53
@Lundin: `p` is a `char *`. There is no aliasing violation. Were you thinking of `* (int *) p`? – Eric Postpischil May 19 '20 at 10:32
@EricPostpischil Ah yeah my bad. The `*(short*)p` is however some sort of effective type hiccup, because the compiler doesn't know what type that's stored there, after the pointer arithmetic on `p`. More serious is the dead certain misaligned access though. – Lundin May 19 '20 at 10:36

score 0 · Answer 3 · answered May 19 '20 at 08:23

There is C code for which there are guarantees by the language, and then there is undefined code with no guarantees of any particular behavior.

The C language does not allow all manner of wild type conversions and arbitrary pointer arithmetic - if you break the rules, anything can happen and no deterministic results are guaranteed. If you were to analyse such non-deterministic code, you might get any kind of result, including program crashes.

To analyse your code in detail and show where it goes wrong:

char *p = (char*) &sample; This is a valid conversion. There are special rules in C that allow conversion from any type to character typ, through a pointer conversion such as this one. You may de-reference the pointer after doing such a conversion.
However, char specifically has implementation-defined signedness. Meaning that it can equal signed char or unsigned char depending on compiler. In case it is signed, so you might experience unexpected output if you de-reference that pointer.
p = p-1; is undefined behavior, anything can happen. We may only do pointer arithmetic on pointers that point at an array. For the purpose of determining if pointer arithmetic is valid, a pointer pointing at a single value variable ("scalar") is to be regarded as a pointer pointing at an array with 1 item.

Pointer arithmetic must result in a pointer pointing at an array item from index 0 to n+1 in an allocated array, where n is the array size. C allows to point 1 item beyond the array as a special case. It does however not allow pointer arithmetic that results in a pointer pointing 1 item before the array as is done in your code.

(As a special case, we may iterate incrementally over any data type using a character pointer, for the purpose of analysing the raw data. The data is then to be regarded as an array of sizeof(data) characters. But your code does not do this.)
*(short*)p is also undefined behavior, because the compiler has no idea where p points at now or what type that is stored there. It is also undefined behavior on any system requiring aligned access, since the pointer is now most definitely misaligned.

Finally, the memory area you now access may be blocked for you by the system, giving you an access violation or segmentation fault. The nature and result of such errors is beyond the scope of C.
If your program manages to spit out some binary result despite all the above problems, that result is likely affected by CPU endianess. At the first line char *p = (char*) &sample;, p could either point at the MS byte or the LS byte, depending on system.
Should the undefined binary goo from the previous remark happen to contain MSB set, you invoke undefined behavior again by left-shifting a negative integer.

Summary: I count 4 different cases of undefined behavior and 2 cases of implementation-defined behavior. There is no guaranteed or deterministic outcome to be expected from this code. There is nothing to learn from inspecting the results. You can only learn something by analysing why the code that gave such results is wrong.

Isn't all the initialized variables in a program is treated as elements of a stack frame? As long as we stay inside the stack frame, I think it is possible to treat all data as a big array and traverse through it. As an example, if we say `int a=1,b=2,*p=&a` then `p--` and then print a,b,*p, *p will print b(if b is used, that is) — , May 19 '20 at 08:45
@Pranavappu The C language makes no such guarantees. You'd be relying on system-specific language extensions, so it's all about if the compiler port has documented behavior for such extensions. — Lundin, May 19 '20 at 08:48

John Colvin · Answer 4 · 2020-05-19T15:35:17.533

-1

I'm just going to take a shot in the dark. If you want to print the value stored in the first 8 bits of sample (1 in this case). Note, this assumes memory is little endian:

short sample = 1;
char* p = (char*)&sample;
p = p - 1;
std::cout << "the first 8 bits of sample: " << (*(short*)p >> 8) << std::endl;

edited May 19 '20 at 15:35

answered May 19 '20 at 05:50

John Colvin

304
3
6

As a pointer points to the first byte of a multi-byte location, doing `p=p-1` will get you wrong value. You should do `p=p+1` if you need higher bytes . Also, physically memory is laid out in such a way that the lowest byte is at left most. So, you are pointing at the first 8-bits by default. – May 19 '20 at 06:20
@Pranavappu p=p-1 is supposed to be the wrong value, which is why it is cast back to a short* ptr, and the bit shift operator is applied, so the value printed is 1 (the value in sample). Does this not behave as I have described? – John Colvin May 19 '20 at 15:17
How is this different from your own answer @Pranavappu? You do p=p+1, followed by a bit shift 8 left. I do p=p-1, followed by a bit shift 8 right. – John Colvin May 19 '20 at 15:24

Why does the returned value of bit operations changes every time?

4 Answers4