6

In a c++ code I am reading, found the following. Can anyone help me understand what does the following statements do?

char buffer[4096];
// some code
int size = *(int*)(buffer);
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
shaikh
  • 582
  • 6
  • 24
  • 8
    Do you know what `(int*)buffer` means? If not read about *casting*. Do you know what the unary dereference operator `*` does (for example what happens if you have `*some_pointer`)? If not read about that too. Then just combine the knowledge about each subject, like they are combined in expression you're wondering about. – Some programmer dude May 11 '16 at 06:29
  • 2
    This statement also possibly causes alignment exception on some architectures. – dbrank0 May 11 '16 at 06:33
  • 5
    This statement also causes undefined behavior. – 2501 May 11 '16 at 06:38
  • 1
    It's taking the address of `buffer`, casting it to `int *`, then dereferencing the resulting pointer, which is undefined behavior. In particular, there is no guarantee that `buffer` is aligned on an accept `int` boundary. The behavior is undefined. You should not attempt to run this code. – Tom Karzes May 11 '16 at 06:59

4 Answers4

14
char buffer[4096];//this is an array of 4096 characters
// some code

int size = *(int*)(buffer);

Will cast the(decayed) character pointer,which is buffer, to an integer pointer. It then dereferences it to get an integer value. The integer value you get from this will be composed of the first 4 character values of the buffer array assuming the size of int is 4 bytes in your machine, or in general will be composed of sizeof(int) characters.

In other words, the memory representation of the first sizeof(int) characters of the buffer array will be treated as though they represent a single integer value,since now it is pointed to by an integer pointer, and that will be stored in the size integer variable when that integer pointer is dereferenced.

That being said, as it has been stated repeatedly in the comments section, this code is unsafe. One thing that comes to mind is, some CPUs have a strict alignment requirements(see this answer), and in this case there is no guarantee that the address of the first element of the buffer array complies with the alignment requirement of an integer resulting in undefined operation in those CPUs.

See @Lundin answer for even more reason why this code is unsafe and may not give you the result you were looking for.

Community
  • 1
  • 1
Biruk Abebe
  • 2,235
  • 1
  • 13
  • 24
  • @user694733, true, it could crash on some architectures due to a bad memory alignment for a pointer to `int` – David Ranieri May 11 '16 at 07:01
  • @user694733 Thanks for your feedback, I've tried to mention one reason that i know will result in undefined operation on some CPUs. The answer by Lundin goes in more detail about the issues with this code and i feel like i would be echoing what has already being said. – Biruk Abebe May 11 '16 at 08:06
8

TL;DR: this code is bad, forget about it and move on.


(buffer) This parenthesis means that the programmer was insecure of their own programming abilities.

Since buffer is an array of characters, using the identifier buffer on its own gives you a pointer to the first element: a char pointer.

(int*) This is a cast, converting the char pointer to an int pointer.

* takes the contents of that integer pointer and the result is stored in the integer size.

Please note that this code is completely unsafe. Many pointer conversions invoke poorly-defined behavior. There might be alignment issues. There might be pointer aliasing issues (Google "strict aliasing rule"). This particular code is also endianess-dependent, meaning that it requires that the contents of the character array has a given byte order.

Overall, it does not make any sense to use signed types like int or char (maybe signed) when doing things like this. In particular, the char type is very problematic since it has implementation-defined signedness and should be avoided. Use unsigned char or uint8_t instead.

Slightly less bad code would look something like this:

#include <stdint.h>

uint8_t buffer[4096];
// some code
uint32_t size = *(uint32_t*)buffer;
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 5
    I think your example code should not use pointer dereference trick at all. Using `memcpy` would still have possible endianness issue, but at least there would not be UB. – user694733 May 11 '16 at 06:46
  • How is code that still has the same undefined behavior less bad? – 2501 May 11 '16 at 08:18
  • @2501 Because it doesn't have the mentioned type problems. Demonstrating how to fix that was the purpose of the code snippet. To fix the undefined behavior, you would need to do something far more drastical. – Lundin May 11 '16 at 10:58
  • 1
    There's a problem with the proposed alternative: it imposes a constraint that is not present in the original code, namely, that the target hardware must provide an 8-bit native type and a 32-bit native type. That might be a valid change, but code samples that impose different requirements from the original code should point out those differences. `unsigned char` will always exist; `uint8_t` might not. – Pete Becker May 11 '16 at 12:33
3

Can anyone help me understand what does the following statements do?

The first statement :

char buffer[4096];

declares an array of chars with size 4096.

The second statement :

int size = *(int*)(buffer);

1. First takes the decayed character pointer to the array buffer (also named buffer), which is a pointer pointing to its first element, set at the time of its declaration

2. Then casts it to pointer to int, or int*

3. Finally, assigns the content of this pointer (which will be of type int) to variable size.

Marievi
  • 4,951
  • 1
  • 16
  • 33
  • "first takes the address of array `buffer`" is wrong, getting the address of `buffer` would be `&buffer`. Instead `buffer` *decays to a pointer* to its first element. – Some programmer dude May 11 '16 at 06:33
  • 1
    I cannot give upvote, until answer answer also explains why this code is broken and should not be used. – user694733 May 11 '16 at 06:52
  • @user694733 Indeed, if only one answer clearly explained why this is ub. – 2501 May 11 '16 at 07:11
1

It takes the address of buffer[0], casts it to an int*, dereferences that, and uses the dereferenced value to initialize size. In other words, it takes the first sizeof(int) bytes of buffer, pretends those bytes are an int, and stores that int's value in size.

Benjamin Lindley
  • 101,917
  • 9
  • 204
  • 274