27

Recently, I wrote some code to compare pointers like this:

if(p1+len < p2)

however, some staff said that I should write like this:

if(p2-p1 > len)

to be safe. Here,p1 and p2 are char * pointers,len is an integer. I have no idea about that.Is that right?

EDIT1: of course,p1 and p2 pointer to the same memory object at begging.

EDIT2:just one min ago,I found the bogo of this question in my code(about 3K lines),because len is so big that p1+len can't store in 4 bytes of pointer,so p1+len < p2 is true.But it shouldn't in fact,so I think we should compare pointers like this in some situation:

if(p2 < p1 || (uint32_t)p2-p1 > (uint32_t)len)
aurelien
  • 404
  • 4
  • 22
jiych.guru
  • 664
  • 1
  • 5
  • 20
  • 1
    I prefer your method. To me it's a lot more readable. – sje397 Jul 11 '13 at 03:15
  • @sje397 me too,but i'm not certainly it's right abosulte. – jiych.guru Jul 11 '13 at 03:19
  • 4
    @sje397: It doesn't matter how readable it is if it's incorrect. – Dietrich Epp Jul 11 '13 at 03:24
  • Till you don't access memory, for any address is just a number of size = `sizof(void*)` , can't cause of memory violation error. `p2-p1` **not safe** but preferable because avoids integer overflow (that is Undefined behavior according to `ISO C section 6.5 paragraph 5`). In-fact not for pointer but even for integer we should write `if(x - y < dif)` – Grijesh Chauhan Jul 11 '13 at 06:14
  • 3
    @GrijeshChauhan: That's not true. If (for example) you have a pointer `char *p` to a memory object of size `12` (so `p+11` points to its last byte), then the mere expression `p+13` invokes undefined behavior, even if absolutely nothing is done with the resulting invalid pointer. (By contrast, `p+12` is absolutely fine: you only get undefined behavior if you actually try to dereference it.) – ruakh Jul 11 '13 at 06:19
  • @ruakh ok you means it because of pointer arithmetic valid with array range **?** Why it so **?** – Grijesh Chauhan Jul 11 '13 at 09:25
  • 4
    @GrijeshChauhan: The standard does not require that pointers be mere integers identifying memory addresses. (For an extreme example, the C compiler for Lisp machines represents the C concept of "pointer" in terms of Lisp concepts.) The standard only requires compilers to be able to construct pointers for the three situations where it makes sense: null pointers, pointers inside allocated memory blocks, and pointers to one position past allocated memory blocks. – ruakh Jul 11 '13 at 14:52
  • @ruakh Excellent! where to find/read this: *`"pointers to one position past allocated memory blocks"` – Grijesh Chauhan Jul 11 '13 at 14:58
  • @GrijeshChauhan: You can download the standard for free via http://www.open-std.org/jtc1/sc22/wg14/www/standards.html (search the page for "latest publicly available version of the C99 standard"). This is defined in section 6.5.6, in point 8 on page 83. (By the way, for future reference: an informative list of cases of undefined behavior is in Appendix J.2. In this case, the relevant bullet-points are the last three on page 494.) – ruakh Jul 11 '13 at 18:28

6 Answers6

25

In general, you can only safely compare pointers if they're both pointing to parts of the same memory object (or one position past the end of the object). When p1, p1 + len, and p2 all conform to this rule, both of your if-tests are equivalent, so you needn't worry. On the other hand, if only p1 and p2 are known to conform to this rule, and p1 + len might be too far past the end, only if(p2-p1 > len) is safe. (But I can't imagine that's the case for you. I assume that p1 points to the beginning of some memory-block, and p1 + len points to the position after the end of it, right?)

What they may have been thinking of is integer arithmetic: if it's possible that i1 + i2 will overflow, but you know that i3 - i1 will not, then i1 + i2 < i3 could either wrap around (if they're unsigned integers) or trigger undefined behavior (if they're signed integers) or both (if your system happens to perform wraparound for signed-integer overflow), whereas i3 - i1 > i2 will not have that problem.


Edited to add: In a comment, you write "len is a value from buff, so it may be anything". In that case, they are quite right, and p2 - p1 > len is safer, since p1 + len may not be valid.

Marc.2377
  • 7,807
  • 7
  • 51
  • 95
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • Even `p2-p1 > len` can explode: Assume `char` pointers on a 32 bit machine for simplicity and `p2 = 0xd0000000, p1 = 0x20000000, len = 1`. Now, `p2 - p1 = 0xb0000000` **which is interpreted as a *negative signed* integer**. As such, `p2-p1 > len` will be evaluated to false. Usually not a problem if the source of your values is benign, but if an attacker controls one of the values involved, such a bug can easily turn nasty. – cmaster - reinstate monica Jun 15 '19 at 07:21
  • The Standard only only requires that that implementations support the use of relational operators between pointers to parts of the same object, but the equality operator is also usable between pointers to different objects, or between pointers that point "just past" different objects, and is supposed to be usable between a pointer to one object and a pointer "just past" another, with the caveat that the result of the comparison would generally not be meaningful. In practice, the latter form may cause gcc to jump the rails and generate nonsensical code. – supercat Jun 15 '19 at 19:46
  • 1
    @oguz ismail, the edit in revision 4 was carefully [reviewed](https://stackoverflow.com/review/suggested-edits/23277320) before being approved. The author of the answer himself has approved it. It is correct, we find it useful, and old comments referencing the old code were cleaned up. I'm rolling it back; if you still find it necessary to revert again, please let's discuss it first - and not initiate an edit war. Thanks – Marc.2377 Jun 15 '19 at 20:39
12

"Undefined behavior" applies here. You cannot compare two pointers unless they both point to the same object or to the first element after the end of that object. Here is an example:

void func(int len)
{
    char array[10];
    char *p = &array[0], *q = &array[10];
    if (p + len <= q)
        puts("OK");
}

You might think about the function like this:

// if (p + len <= q)
// if (array + 0 + len <= array + 10)
// if (0 + len <= 10)
// if (len <= 10)
void func(int len)
{
    if (len <= 10)
        puts("OK");
}

However, the compiler knows that ptr <= q is true for all valid values of ptr, so it might optimize the function to this:

void func(int len)
{
    puts("OK");
}

Much faster! But not what you intended.

Yes, there are compilers that exist in the wild that do this.

Conclusion

This is the only safe version: subtract the pointers and compare the result, don't compare the pointers.

if (p - q <= 10)
Community
  • 1
  • 1
Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • @Epp I think it's necessary to add `p > q` check – jiych.guru Jul 11 '13 at 05:27
  • @jiych.guru: Why? `p - q` results in a signed value. – Dietrich Epp Jul 11 '13 at 16:22
  • You wrote that the compiler knows that `ptr <= q` is true for all valid values of `ptr`, but there is no mentioning of `ptr` in your code samples. Did you mean `p` by any chance? – Frerich Raabe Jul 18 '13 at 09:01
  • @FrerichRaabe: I wrote "for all valid values of `ptr`", which is a way of introducing a new universally quantified variable. So, no, I did not mean `p`, because `p` already has a definition. – Dietrich Epp Jul 19 '13 at 03:32
  • @DietrichEpp: Ah, now I see what you mean. I usually see that the other way round (resembling the formal notation more closely), i.e. "for all valid values of ptr, ptr <= q is true". I now understand what you mean - given that `q` is the last valid value, `ptr <= q` certainly holds for all *valid* values of `ptr`. – Frerich Raabe Jul 19 '13 at 06:49
  • @jinawee: That paragraph says “In all other cases, the behavior is undefined.” – Dietrich Epp Feb 27 '19 at 08:01
9

Technically, p1 and p2 must be pointers into the same array. If they are not in the same array, the behaviour is undefined.

For the addition version, the type of len can be any integer type.

For the difference version, the result of the subtraction is ptrdiff_t, but any integer type will be converted appropriately.

Within those constraints, you can write the code either way; neither is more correct. In part, it depends on what problem you're solving. If the question is 'are these two elements of the array more than len elements apart', then subtraction is appropriate. If the question is 'is p2 the same element as p1[len] (aka p1 + len)', then the addition is appropriate.

In practice, on many machines with a uniform address space, you can get away with subtracting pointers to disparate arrays, but you might get some funny effects. For example, if the pointers are pointers to some structure type, but not parts of the same array, then the difference between the pointers treated as byte addresses may not be a multiple of the structure size. This may lead to peculiar problems. If they're pointers into the same array, there won't be a problem like that — that's why the restriction is in place.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0

The existing answers show why if (p2-p1 > len) is better than if (p1+len < p2), but there's still a gotcha with it -- if p2 happens to point BEFORE p1 in the buffer and len is an unsigned type (such as size_t), then p2-p1 will be negative, but will be converted to a large unsigned value for comparison with the unsigned len, so the result will probably be true, which may not be what you want.

So you might actually need something like if (p1 <= p2 && p2 - p1 > len) for full safety.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
0

As Dietrich already said, comparing unrelated pointers is dangerous, and could be considered as undefined behavior.

Given that two pointers are within the range 0 to 2GB (on a 32-bit Windows system), subtracting the 2 pointers will give you a value between -2^31 and +2^31. This is exactly the domain of a signed 32-bit integer. So in this case it does seem to make sense to subtract two pointers because the result will always be within the domain you would expect.

However, if the LargeAddressAware flag is enabled in your executable (this is Windows-specific, don't know about Unix), then your application will have an address space of 3GB (when run in 32-bit Windows with the /3G flag) or even 4GB (when run on a 64-bit Windows system). If you then start to subtract two pointers, the result could be outside the domain of a 32-bit integer, and your comparison will fail.

I think this is one of the reasons why the address space was originally divided in 2 equal parts of 2GB, and the LargeAddressAware flag is still optional. However, my impression is that current software (your own software and the DLL's you're using) seem to be quite safe (nobody subtracts pointers anymore, isn't it?) and my own application has the LargeAddressAware flag turned on by default.

Patrick
  • 23,217
  • 12
  • 67
  • 130
0

Neither variant is safe if an attacker controls your inputs

The expression p1 + len < p2 compiles down to something like p1 + sizeof(*p1)*len < p2, and the scaling with the size of the pointed-to type can overflow your pointer:

int *p1 = (int*)0xc0ffeec0ffee0000;
int *p2 = (int*)0xc0ffeec0ffee0400;
int len =       0x4000000000000000;
if(p1 + len < p2) {
    printf("pwnd!\n");
}

When len is multiplied by the size of int, it overflows to 0 so the condition is evaluated as if(p1 + 0 < p2). This is obviously true, and the following code is executed with a much too high length value.


Ok, so what about p2-p1 < len. Same thing, overflow kills you:

char *p1 = (char*)0xa123456789012345;
char *p2 = (char*)0x0123456789012345;
int len = 1;
if(p2-p1 < len) {
    printf("pwnd!\n");
}

In this case, the difference between the pointer is evaluated as p2-p1 = 0xa000000000000000, which is interpreted as a negative signed value. As such, it compares smaller then len, and the following code is executed with a much too low len value (or much too large pointer difference).


The only approach that I know is safe in the presence of attacker-controlled values, is to use unsigned arithmetic:

if(p1 < p2 &&
   ((uintptr_t)p2 - (uintptr_t)p1)/sizeof(*p1) < (uintptr_t)len
) {
    printf("safe\n");
}

The p1 < p2 guarantees that p2 - p1 cannot yield a genuinely negative value. The second clause performs the actions of p2 - p1 < len while forcing use of unsigned arithmetic in a non-UB way. I.e. (uintptr_t)p2 - (uintptr_t)p1 gives exactly the count of bytes between the bigger p2 and the smaller p1, no matter the values involved.

Of course, you don't want to see such comparisons in your code unless you know that you need to defend against determined attackers. Unfortunately, it's the only way to be safe, and if you rely on either form given in the question, you open yourself up to attacks.

Community
  • 1
  • 1
cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106