I am programming C++ using gcc on an obscure system called linux x86-64. I was hoping that may be there are a few folks out there who have used this same, specific system (and might also be able to help me understand what is a valid pointer on this system). I do not care to access the location pointed to by the pointer, just want to calculate it via pointer arithmetic.
According to section 3.9.2 of the standard:
A valid value of an object pointer type represents either the address of a byte in memory (1.7) or a null pointer.
And according to [expr.add]/4:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.
And according to a stackoverflow question on valid C++ pointers in general:
Is 0x1 a valid memory address on your system? Well, for some embedded systems it is. For most OSes using virtual memory, the page beginning at zero is reserved as invalid.
Well, that makes it perfectly clear! So, besides NULL
, a valid pointer is a byte in memory, no, wait, it's an array element including the element right after the array, no, wait, it's a virtual memory page, no, wait, it's Superman!
(I guess that by "Superman" here I mean "garbage collectors"... not that I read that anywhere, just smelled it. Seriously, though, all the best garbage collectors don't break in a serious way if you have bogus pointers lying around; at worst they just don't collect a few dead objects every now and then. Doesn't seem like anything worth messing up pointer arithmetic for.).
So, basically, a proper compiler would have to support all of the above flavors of valid pointers. I mean, a hypothetical compiler having the audacity to generate undefined behavior just because a pointer calculation is bad would be dodging at least the 3 bullets above, right? (OK, language lawyers, that one's yours).
Furthermore, many of these definitions are next to impossible for a compiler to know about. There are just so many ways of creating a valid memory byte (think lazy segfault trap microcode, sideband hints to a custom pagetable system that I'm about to access part of an array, ...), mapping a page, or simply creating an array.
Take, for example, a largish array I created myself, and a smallish array that I let the default memory manager create inside of that:
#include <iostream>
#include <inttypes.h>
#include <assert.h>
using namespace std;
extern const char largish[1000000000000000000L];
asm("largish = 0");
int main()
{
char* smallish = new char[1000000000];
cout << "largish base = " << (long)largish << "\n"
<< "largish length = " << sizeof(largish) << "\n"
<< "smallish base = " << (long)smallish << "\n";
}
Result:
largish base = 0
largish length = 1000000000000000000
smallish base = 23173885579280
(Don't ask how I knew that the default memory manager would allocate something inside of the other array. It's an obscure system setting. The point is I went through weeks of debugging torment to make this example work, just to prove to you that different allocation techniques can be oblivious to one another).
Given the number of ways of managing memory and combining program modules that are supported in linux x86-64, a C++ compiler really can't know about all of the arrays and various styles of page mappings.
Finally, why do I mention gcc
specifically? Because it often seems to treat any pointer as a valid pointer... Take, for instance:
char* super_tricky_add_operation(char* a, long b) {return a + b;}
While after reading all the language specs you might expect the implementation of super_tricky_add_operation(a, b)
to be rife with undefined behavior, it is in fact very boring, just an add
or lea
instruction. Which is so great, because I can use it for very convenient and practical things like non-zero-based arrays if nobody is putzing with my add
instructions just to make a point about invalid pointers. I love gcc
.
In summary, it seems that any C++ compiler supporting standard linkage tools on linux x86-64 would almost have to treat any pointer as a valid pointer, and gcc
appears to be a member of that club. But I'm not quite 100% sure (given enough fractional precision, that is).
So... can anyone give a solid example of an invalid pointer in gcc linux x86-64? By solid I mean leading to undefined behavior. And explain what gives rise to the undefined behavior allowed by the language specs?
(or provide gcc
documentation proving the contrary: that all pointers are valid).