2

I'm really confused by how uint32_t pointers work in C++

I was just fiddling around trying to learn TEA, and I didn't understand when they passed a uint32_t parameter to the encrypt function, and then in the function declared a uint32_t variable and assigning the parameter to it as if the parameter is an array.

Like this:

void encrypt (uint32_t* v, uint32_t* k) {
    uint32_t v0=v[0], v1=v[1], sum=0, i;

So I decided to play around with uint32_t pointers, and wrote this short code:

int main ()
{
    uint32_t *plain_text;
    uint32_t key;
    unsigned int temp = 123232;
    plain_text = &temp;
    key = 7744;

    cout << plain_text[1] << endl;

    return 0;
}

And it blew my mind when the output was the value of "key". I have no idea how it works... and then when I tried with plain_text[0], it came back with the value of "temp".

So I'm stuck as hell trying to understand what's happening.

Looking back at the TEA code, is the uint32_t* v pointing to an array rather than a single unsigned int? And was what I did just a fluke?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Andronius
  • 47
  • 1
  • 5

4 Answers4

3

uint32_t is a type. It means unsigned 32-bit integer. On your system it is probably a typedef name for unsigned int.

There's nothing special about a pointer to this particular type; you can have pointers to any type.

The [] in C and C++ are actually pointer indexing notation. p[0] means to retrieve the value at the location the pointer points to. p[1] gets the value at the next memory location after that. Then p[2] is the next location after that, and so on.

You can use this notation with arrays too because the name of an array is converted to a pointer to its first element when used like this.

So, your code plain_text[1] tries to read the next element after temp. Since temp is not actually an array, this causes undefined behaviour. In your particular case, the manifestation of this undefined behaviour is that it managed to read the memory address after temp without crashing, and that address was the same address where key is stored.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
  • Ohh thanks for the explanation. I took the TEA code off wiki, which was written for C, so I guess I needed to rewrite it a bit for it to work properly in C++. – Andronius Jun 17 '15 at 06:09
  • Why is reading from `plain_text[1]` is undefined behaviour? I understand that writing to it is. – rozina Jun 17 '15 at 06:15
  • @Andronius this part is the same in C as C++. You need to stop accessing out of bounds of variables! – M.M Jun 17 '15 at 06:36
  • @rozina because pointers may only be used to access the object they were assigned to point to – M.M Jun 17 '15 at 06:37
  • @rozina: Because the standard says so. And the standard says so because it would be ridiculous to require any particular behavior (even *implementation defined*) here. And even in practice you can get a variety of behaviors: e.g. segmentation fault. –  Jun 17 '15 at 06:38
  • @Hurkyl I would be very surprised that reading from memory would ever produce a segmentation fault. Writing to it is another matter. So the OP code is UB, how about the following code: `int buffer[10]; std::cout << buffer[1] << std::endl; ` To me this is equivalent code to OP's code. I doubt this is undefined. The value of `buffer[1]` is undefined, but the act of reading the memory location should not be. – rozina Jun 17 '15 at 08:16
  • @rozina: Why would reading be any different than writing in regard to segmentation faults? If you're not allowed to access a segment, you're not allowed to access it. Regarding `buffer[1]`, that's a perfectly valid (reference to an) object of type `int` whose only deficiency is being *uninitialized*. –  Jun 17 '15 at 09:02
  • ... also you should keep in mind that C describes a memory model where every object you ever create is completely isolated from every other object (caveat: an `int[10]`, of course, is composed of ten `int` objects that are 'consecutive' as far as pointer arithmetic is concerned, and other similar things). You're not even allowed to subtract pointers pointing into distinct objects; `int x, y; int *p=y-x;` is undefined behavior. An implementation can give it whatever meaning it wants, but the C standard itself offers no restrictions. –  Jun 17 '15 at 09:07
1

Formally your program has undefined behavior.

The expression plain_text[1] is equivalent to *(plain_text + 1) ([expr.sub] / 1). Although you can point to one past the end of an array (objects that aren't arrays are still considered single-element arrays for the purposes of pointer arithmetic ([expr.unary.op] / 3)), you cannot dereference this address ([expr.unary.op] / 1).

At this point the compiler can do whatever it wants to, in this case it has simply decided to treat the expression as if it were pointing to an array and that plain_text + 1, i.e. &temp + 1 points to the next uint32_t object in the stack, which in this case by coincidence is key.

You can see what's going on if you look at the assembly

mov DWORD PTR -16[rbp], 123232 ; unsigned int temp=123232;
lea rax, -16[rbp]
mov QWORD PTR -8[rbp], rax     ; plain_text=&temp;
mov DWORD PTR -12[rbp], 7744   ; key=7744;
mov rax, QWORD PTR -8[rbp]
add rax, 4                     ; plain_text[1], i.e. -16[rbp] + 4 == -12[rbp] == key
mov eax, DWORD PTR [rax]
mov edx, eax
mov rcx, QWORD PTR .refptr._ZSt4cout[rip]
call    _ZNSolsEj              ; std::ostream::operator<<(unsigned int)
mov rdx, QWORD PTR .refptr._ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_[rip]
mov rcx, rax
call    _ZNSolsEPFRSoS_E       ; std::ostream::operator<<(std::ostream& (*)(std::ostream&))
mov eax, 0
add rsp, 48
pop rbp
ret
user657267
  • 20,568
  • 5
  • 58
  • 77
  • Thanks for the info! Assembly is well beyond me but the logic really cleared things up for this little accident – Andronius Jun 17 '15 at 06:10
  • Are you sure it is UB? I would think that `pointer[n]` is always well defined. In OP's case there is just no guarantee what will be at the memory address of `plain_text[1]`. Where `plain_text[1]` point's too is very well defined - the next memory location after `plain_text[0]`. – rozina Jun 17 '15 at 06:11
  • If it's any help, plain_text[2] return an int value too, wasn't anything I've typed, but it came back with something – Andronius Jun 17 '15 at 06:14
  • @rozina It is absolutely UB, the only time you can use pointer arithmetic is when pointing to arrays, and expressions can only result in a pointer to an element, or one past the end. – user657267 Jun 17 '15 at 06:16
  • @Andronius It's just going to show you whatever garbage value is at that particular address, don't do this. – user657267 Jun 17 '15 at 06:17
  • I would like to see some evidence that it is undefined behaviour, I can't take your word for it. Your last comment also proves it is not undefined, since you know what the result of will be - so it is well defined. What will be at the memory address is undefined in this case ofc. But reading from a specific memory address is not undefined. – rozina Jun 17 '15 at 06:18
  • @rozina [expr.add] / 4, the pointer used in the expression, and the result of the expression, must refer to the same **array** object, or one past the end. `temp` isn't an array. – user657267 Jun 17 '15 at 06:19
  • @rozina There is a note that mentions that `An object that is not an array element is considered to belong to a single-element array for this purpose`, so in this particular case the arithmetic itself is not undefined, but the act of dereferencing it certainly is. I'll update the question. – user657267 Jun 17 '15 at 06:26
0

In C and C++ arrays decay to pointers, resulting in array/pointer equivalence.

a[1]

when a is a simple type is equivalent to

*(a + 1)

If a is an array of simple types, a will decay at the earliest opportunity to the address of element 0.

int arr[5] = { 0, 1, 2, 3, 4 };
int i = 10;

int* ptr;

ptr = arr;
std::cout << *ptr << "\n"; // outputs 0
ptr = &arr[0]; // same address
std::cout << *ptr << "\n"; // outputs 0
std::cout << ptr[4] << "\n"; // outputs 4
std::cout << *(ptr + 4) << "\n"; // outputs 4
ptr = &i;
std::cout << *ptr << "\n"; // outputs 10
std::cout << ptr[0] << "\n";
std::cout << ptr[1] << "\n"; // UNDEFINED BEHAVIOR.
std::cout << *(ptr + 1) << "\n"; // UNDEFINED BEHAVIOR.

To understand ptr[0] and ptr[1] you simply have to understand pointer arithmetic.

Community
  • 1
  • 1
kfsone
  • 23,617
  • 2
  • 42
  • 74
-1
uint32_t *plain_text; // In memory, four bytes are reserved for ***plain_text***

uint32_t key; // In memory, the next four bytes after ***plain_text*** are reserved for ***key***

Thus: &plain_text[0] is plain_text and &plain_text[1] refers to the the next four bytes which are at &key.

This scenario may explain that behaviour.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
elnineo
  • 17
  • 1
  • 6