4

I followed the discussion on One-byte-off pointer still valid in C?.

The gist of that discussion, as far as I could gather, was that if you have:

char *p = malloc(4);

Then it is OK to get pointers up to p+4 by using pointer arithmetic. If you get a pointer by using p+5, then the behavior is undefined.

I can see why dereferencing p+5 could cause undefined behavior. But undefined behavior using just pointer arithmetic?

Why would the arithmetic operators + and - not be valid operations? I don’t see any harm by adding or subtracting a number from a pointer. After all, a pointer is a represented by a number that captures the address of an object.

Of course, I was not in the standardization committee :) I am not privy to the discussions they had before codifying the standard. I am just curious. Any insight will be useful.

Community
  • 1
  • 1
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • 2
    Dereferencing `p+4` in your example could also cause undefined behavior. – barak manos Mar 17 '14 at 21:09
  • Some of this stuff has its roots in niche or superseded architectures. There's probably a veteran that can tell you stories about architecture X where simple pointer arithmetic would translate to a certain CPU instruction that... – Jon Mar 17 '14 at 21:10
  • 3
    If this is correct (and I'm not sure it is), my guess would be that they were trying to allow for architectures and environments which might want to detect the pointer-math error at the point where it occurred rather than at the point where it was used. Again, undefined, which means there's no promise that it will fail ... just no promised that it will succeed. – keshlam Mar 17 '14 at 21:10
  • Actually, for C++, doesn't the whole STL iterator being inclusive of the begin, but exclusive of the end mean that end iterators are always off the end of the allocated memory (for array, and probably vector)? – pat Mar 17 '14 at 21:20
  • Well, actually you're only certain of what's going on on p, p+1, p+2 and p+3 (as you're allocating 4 spaces in memory). I believe malloc defines a pointer to the next memory address it will use, so p+4 might be defined (if I'm correct), but you can't know what's there, and it's dangerous to play with it (as you're program might have used it after the your malloc call) – Inox Mar 17 '14 at 21:22
  • 1
    @user3277173 You can still compare against p+4. – this Mar 17 '14 at 21:23
  • @self, which implies that you can safely calculate `p+4` in order to compare against it. – pat Mar 17 '14 at 21:25
  • @pat: correct. You can safely calculate p+4 but not p+5. A consequence is that an object cannot include the memory address corresponding to (intptr_t)(-1), even if that address actually exists on the hardware architecture. – rici Mar 17 '14 at 21:26
  • @self, yes, you can. But for most purposes you don't want to compare it to anything as you can't know beforehand what's in there. – Inox Mar 17 '14 at 21:26
  • well the modern OS's aren't the whole story for the C language... in old Mac Classic days UB meant very often writing to memory that you didn't own and the machine going kablewey ... – Grady Player Mar 17 '14 at 21:27
  • 2
    @pat, the standard guarantees that computing `p+4` is a valid operation. Dereferencing `p+4` is obviously not. Performing arithmetic operations in the range `p`:`p+4` is also guaranteed to succeed. – R Sahu Mar 17 '14 at 21:27
  • 3
    @user3277173: you commonly compare against the one-past-the-end pointer in order to terminate loops. (eg. `iter != foo.end()`). The legality of one-past-the-end pointers is specifically to allow this idiom. – rici Mar 17 '14 at 21:30

4 Answers4

4

The simplest answer is that it is conceivable that a machine traps integer overflow. If that were the case, then any pointer arithmetic which wasn't confined to a single storage region might cause overflow, which would cause a trap, disrupting execution of the program. C shouldn't be obliged to check for possible overflow before attempting pointer arithmetic, so the standard allows a C implementation on such a machine to just allow the trap to happen, even if chaos ensues.

Another case is an architecture where memory is segmented, so that a pointer consists of a segment address (with implicit trailing 0s) and an offset. Any given object must fit in a single segment, which means that valid pointer arithmetic can work only on the offset. Again, overflowing the offset in the course of pointer arithmetic might produce random results, and the C implementation is under no obligation to check for that.

Finally, there may well be optimizations which the compiler can produce on the assumption that all pointer arithmetic is valid. As a simple motivating case:

if (iter - 1 < object.end()) {...}

Here the test can be omitted because it must be true for any pointer iter whose value is a valid position in (or just after) object. The UB for invalid pointer arithmetic means that the compiler is not under any obligation to attempt to prove that iter is valid (although it might need to ensure that it is based on a pointer into object), so it can just drop the comparison and proceed to generate unconditional code. Some compilers may do this sort of thing, so watch out :)

Here, by the way, is the important difference between unspecified behaviour and undefined behaviour. Comparing two pointers (of the same type) with == is defined regardless of whether they are pointers into the same object. In particular, if a and b are two different objects of the same type, end_a is a pointer to one-past-the-end of a and begin_b is a pointer to b, then

end_a == begin_b

is unspecified; it will be 1 if and only if b happens to be just after a in memory, and otherwise 0. Since you can't normally rely on knowing that (unless a and b are array elements of the same array), the comparison is normally meaningless; but it is not undefined behaviour and the compiler needs to arrange for either 0 or 1 to be produced (and moreover, for the same comparison to consistently have the same value, since you can rely on objects not moving around in memory.)

rici
  • 234,347
  • 28
  • 237
  • 341
2

One case I can think of where the result of a + or - might give unexpected results is in the case of overflow or underflow.

The question you refer to points out that for p = malloc(4) you can do p+4 for comparison. One thing this needs to guarantee is that p+4 will not overflow. It doesn't guarantee that p+5 wont overflow.

That is to say that the + or - themselves wont cause any problems, but there is a chance, however small, that they will return a value that is unsuitable for comparison.

Luke
  • 708
  • 5
  • 13
1

Performing basic +/- arithmetic on a pointer will not cause a problem. The order of pointer values is sequential: &p[0] < &p[1] < ... &p[n] for a type n objects long. But pointer arithmetic outside this range is not defined. &p[-1] may be less or greater than &p[0].

int *p = malloc(80 * sizeof *p);
int *q = p + 1000;
printf("p:%p q:%p\n", p, q);

Dereferencing pointers outside their range or even inside the memory range, but unaligned is a problem.

printf("*p:%d\n", *p);  // OK
printf("*p:%d\n", p[79]);  // OK
printf("*p:%d\n", p[80]);  // Bad, but &p[80] will be greater than &p[79]
printf("*p:%d\n", p[-1]);  // Bad, order of p, p[-1] is not defined 
printf("*p:%d\n", p[81]);  // Bad, order of p[80], p[81] is not defined
char *r = (char*) p;
printf("*p:%d\n", *((int*) (r + 1)) );  // Bad
printf("*p:%d\n", *q);  // Bad

Q: Why is p[81] undefined behavior?
A: Example: memory runs 0 to N-1. char *p has the value N-81. p[0] to p[79] is well defined. p[80] is also well defined. p[81] would need to the value N to be consistent, but that overflows so p[81] may have the value 0, N or who knows.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 2
    The OP is correct that performing pointer arithmetic out of bounds invokes undefined behavior. The fact that it behaves as expected on most (if not all) platforms does not change that. – sepp2k Mar 17 '14 at 21:30
0

A couple of things here, the reason p+4 would be valid in such a case is because iteration to one past the last position is valid.

p+5 would not be a problem theoretically, but according to me the problem will be when you will try to dereference (p+5) or maybe you will try to overwrite that address.

aghoribaba
  • 131
  • 1
  • 1
  • 11
  • I understand the part about dereferencing `p+5`. The question is why would computing `p+5` result in undefined behavior. – R Sahu Mar 17 '14 at 21:46
  • are you saying that you can't printf, p+5 ? – aghoribaba Mar 17 '14 at 21:52
  • p+5, p+10, p+12; all are valid operations. You can go on printing as many addresses you like. The only problem is referring to there values. Or maybe even in some cases the address doesn't exists. – aghoribaba Mar 17 '14 at 21:56
  • No, I am not. Since the behavior is undefined as per the standard, a compiler can choose sane behavior or otherwise. – R Sahu Mar 17 '14 at 21:56
  • It is undefined because of this very reason, that you might dereference it or overwrite it. You just allocated memory for four characters, then they expect you to just use that memory. If you wanted memory for 5 characters, allocate memory for 5 characters. – aghoribaba Mar 17 '14 at 22:00
  • @aghoribaba: That is *not* the reason this behaviour is undefined. `p+5` is *not* a valid operation. ("If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; *otherwise, the behavior is undefined.*"). The fact that it "works" with your compiler on your computer is irrelevant. – rici Mar 18 '14 at 00:30