Is dereferencing a pointer notabley slower than just accessing that value directly? I suppose my question is - how fast is the deference operator?
-
3Is it speed or memory you are concerned about? – ruslik Nov 02 '11 at 16:19
-
Is your question about speed or memory? – Doug T. Nov 02 '11 at 16:19
-
1Impossible to answer in general. A memory access that hits in the L1 cache will be hundreds (possibly thousands) of times faster than one that has to access actual RAM. Do not speculate; benchmark. – Nemo Nov 02 '11 at 16:25
-
If you mean with "directly", as `array[i]` instead of `*(ptr)`, then there is no difference whatsoever. `array[i]` is shorthand for `*(array+i)`, and you get the `+i` for free in modern CPU's – MSalters Nov 03 '11 at 12:30
-
No i just wondered how much pointer indirection actually costs – user965369 Nov 03 '11 at 16:43
5 Answers
Going through a pointer indirection can be much slower because of how a modern CPU works. But it has nothing much to do with runtime memory.
Instead, speed is affected by prediction and cache.
Prediction is easy when the pointer has not been changed or when it is changed in predictable ways (for example, increment or decrement by four in a loop). This allows the CPU to essentially run ahead of the actual code execution, figure out what the pointer value is going to be, and load that address into cache. Prediction becomes impossible when the pointer value is built by a complex expression like a hash function.
Cache comes into play because the pointer might point into memory that isn't in cache and it will have to be fetched. This is minimized if prediction works but if prediction is impossible then in the worst case you can have a double impact: the pointer is not in cache and the pointer target is not in cache either. In that worst-case the CPU would stall twice.
If the pointer is used for a function pointer, the CPU's branch predictor comes into play. In C++ virtual tables, the function values are all constant and the predictor has it easy. The CPU will have the code ready to run and in the pipeline when execution goes through the indirect jump. But, if it is an unpredictable function pointer the performance impact can be heavy because the pipeline will need to be flushed which wastes 20-40 CPU cycles with each jump.

- 53,022
- 10
- 79
- 131
-
Ok, so if I have a couple of nested loops that use a buffer (every time it uses the buffer it dereferences a pointer twice in this fnc), the pointer values will (presumabley) be predictable? And it won't realy matter overall? – user965369 Nov 02 '11 at 16:34
-
2If the buffer is small enough to fit in cache, the memory accesses will be very fast whether they are predictable or not. – Nemo Nov 02 '11 at 16:37
-
@user965369: Do your best to make sure the buffer fits into L1 cache, if you cannot do that, at least make it fit into L2 cache. For maximum speed process larger buffers in cache-sized blocks. – Zan Lynx Nov 02 '11 at 16:45
Depends on stuff like:
- whether the "directly accessed" value is in a register already, or on the stack (that's also a pointer indirection)
- whether the target address is in cache already
- the cache architecture, bus architecture etc.
ie, too many variables to usefully speculate about without narrowing it down.
If you really want to know, benchmark it on your specific hardware.

- 64,155
- 6
- 88
- 132
it requires a memory access more:
- read the address stored into the pointer variable
- read the value at the address read
This could not be equal to 2 simple operation, because it may require also more time due to access an address not already loaded in the cache.

- 38,762
- 28
- 132
- 190
-
It requires a *memory access* more, which may be prohibitive in a tight loop. Not all CPU operations are created equal. – Fred Foo Nov 02 '11 at 16:21
It does. It costs an extra fetch.
Accessing a variable by value, the variable is directly read from its memory location.
Accessing the same through pointer adds an overhead of fetching the address of the variable from the pointer and then reading the value from that memory location.
Ofcourse, Assuming that the variable is not placed in a register, which it would be in some scenarios like tight loops. I believe the Question seeks answer of an overhead assuming no such scenarios.

- 202,538
- 53
- 430
- 533
Assuming you're dealing with a real pointer (not a smart pointer of some sort), the dereference operation doesn't consume (data) memory at all. It does (potentially) involve an extra memory reference though: one to load the pointer itself, the other to access the data pointed to by the pointer.
If you're using a pointer in a tight loop, however, it'll normally be loaded into a register for the duration. In this case, the cost is mostly in terms of extra register pressure (i.e., if you use a register to store that pointer, you can't use it to store something else at the same time). If you have an algorithm that would otherwise exactly fill the registers, but with enregistering a pointer would overflow to memory it can make a difference. At one time, that was a pretty big loss, but with most modern CPUs (with more registers and on-board cache) that's rarely a big issue. The obvious exception would be an embedded CPU with fewer registers and no cache (and without on-chip memory).
The bottom line is that it's usually pretty negligible, often below the threshold where you can even measure it dependably.

- 476,176
- 80
- 629
- 1,111