4

For the example below, what may cause undefined behavior? and why?

#include <cstddef> 
#include <iostream> 

template <typename Ty> 
bool in_range(const Ty *test, const Ty *r, size_t n) 
{ 
    return 0 < (test - r) && (test - r) < (std::ptrdiff_t)n; 
}

void f() { 
     double foo[10]; 
     double *x = &foo[0]; 
     double bar; 
     std::cout << std::boolalpha << in_range(&bar, x, 10);
}

I have not found the answer in When is pointer subtraction undefined in C?

Stargateur
  • 24,473
  • 8
  • 65
  • 91
Ahmad Siavashi
  • 979
  • 1
  • 12
  • 29
  • You *should* have found the answer over at the question you referenced, because the answer is the same for C++ as it is for C, and that answer is presented over there. – John Bollinger Dec 19 '17 at 17:00

3 Answers3

10

Pointer arithmetic, including the subtraction of two pointers, is only defined if the pointers point to elements within the same array, or one past the end of that array. In this context, a scalar counts as an array of size 1.

It's pretty pointless to allow pointer arithmetic in any other instance. To do that would unnecessarily constrain the C's memory model and could curtail its flexibility and ability to port to exotic architectures.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 2
    The corollary, of course, is that an `in_range()` function such as the OP presents is almost completely pointless. You need to already know the answer to safely call the function. – John Bollinger Dec 19 '17 at 16:56
  • @coderredoc Stackoverflow remove last beta who was very useful to sort question but https://stackoverflow.com/questions/tagged/c?sort=active&pageSize=15 is enough for me. active question regroup new and update question and new and update answer to the related tag. You just need to wait to see a bar that tell you there are new event and click on it. – Stargateur Dec 19 '17 at 17:00
  • @Stargateur: That's the one! – Bathsheba Dec 19 '17 at 17:00
  • It's pretty point(er)less to allow pointer arithmetic in any other instance (pun intended) – dsp_user Dec 19 '17 at 17:01
  • @Stargateur.: If I follow this i will get new question fast? – user2736738 Dec 19 '17 at 17:01
  • Yes͏͏͏͏͏͏͏͏͏͏͏͏͏! – Bathsheba Dec 19 '17 at 17:01
  • @Stargateur Thanks...@Bathsheba Thanks – user2736738 Dec 19 '17 at 17:04
  • @Bathsheba: but note that C++ does place some restrictions (that aren't present in C) that require ordering between unrelated pointers (of the same type), even though arithmetic as such isn't supported. As shown in my answer, this allows his goal to be accomplished, even though the method he tried to use won't work. – Jerry Coffin Dec 19 '17 at 19:41
  • I guess it depends on what you and I each mean, @JerryCoffin. Your point that one can write a version of `in_range()` using `std::less` is well-taken, but we do seem to agree that one cannot write one based on pointer arithmetic or the `<` operator. Calling my claim *utterly* false, then, is exaggerated -- there is some truth to it. Furthermore, if I insist that "an `in_range()` function such as the OP presents" should be interpreted as one that, like the OP's, relies on pointer arithmetic and the `<` operator, then my claim is true in those terms, up to the limit of defined behavior. – John Bollinger Dec 19 '17 at 19:58
6

For the code, as you've written it, the answer is basically the same for C++ as it is for C: you get defined behavior if and only if the two pointers involved refer to parts of the same array, or one past its end (where, as @bathsheba already noted, a non-array object is treated as being the same as an array of one item).

C++ does, however, add one wrinkle that might be useful to know about here: even though neither subtraction nor ordered comparisons (e.g., <) is required to produce meaningful results when applied to pointers to separate objects, std::less<T> and friends, from <functional> are required to do so. So, given two separate objects like this:

Object a;
Object b;

...comparing the addresses of the two objects with the comparison objects must "yield a strict total order that is consistent among those specializations and is also consistent with the partial order imposed by the built-in operators <, >, <=, >=." (N4659, [comparisons]/2).

As such, you could write your function something like this:

template <typename Ty> 
bool in_range(const Ty *test, const Ty *begin, const Ty *end) 
{ 
    return std::less_equal<Ty *>()(begin, test) && std::less<Ty *>()(test, end);
}

If you really want to maintain the original function signature, you could do that as well:

template <typename Ty> 
bool in_range(const Ty *test, const Ty *r, size_t n) 
{ 
    auto end = r + n;
    return std::less_equal<Ty *>()(r, test) && std::less<Ty *>()(test, end);
}

[Note that I've written it using std::less_equal for the first comparison, and std:less for the second to match the typically expected semantics of C++, where the range is defined as [begin, end). ]

This does carry one proviso though: you need to ensure that r points to to the beginning of an array of at least n items1, or else the auto end = r + n; will produce undefined behavior.

At least for what I'd expect as the typical use-case for such a function, you can probably simplify usage a little but by passing the array itself, rather than a pointer and explicit length:

template <class Ty, size_t N>
bool in_range(Ty (&array)[N], Ty *test) {
    return  std::less_equal<Ty *>()(&array[0], test) && 
            std::less<Ty *>()(test, &array[0] + N);
}

In this case, you'd just pass the name of the array, and the pointer you want to test:

int foo[10];
int *bar = &foo[4];

std::cout << std::boolalpha << in_range(foo, bar) << "\n"; // returns true

This only supports testing against actual arrays though. If you attempt to pass a non-array item as the first parameter it simply won't compile:

int foo[10];
int bar;
int *baz = &foo[0];
int *ptr = new int[20];

std::cout << std::boolalpha << in_range(bar, baz) << "\n"; // won't compile
std::cout << std::boolalpha << in_range(ptr, baz) << "\n"; // won't compile either

The former probably prevents some accidents. The latter might not be quite as desirable. If we want to support both, we can do so via overloading (for all three situations, if we choose to):

template <class Ty, size_t N>
bool in_range(Ty (&array)[N], Ty *test) {
    return  std::less_equal<Ty *>()(&array[0], test) &&
            std::less<Ty *>()(test, &array[0]+ N);
}

template <class Ty>
bool in_range(Ty &a, Ty *b) { return &a == b; }

template <class Ty>
bool in_range(Ty a, Ty b, size_t N) {
    return std::less_equal<Ty>()(a, b) && 
           std::less<Ty>()(b, a + N);
}

void f() { 
     double foo[10]; 
     double *x = &foo[0]; 
     double bar;
     double *baz = new double[20];

     std::cout << std::boolalpha << in_range(foo, x) << "\n";
     std::cout << std::boolalpha << in_range(bar, x) << "\n";
     std::cout << std::boolalpha << in_range(baz, x, 20) << "\n";
}

1. If you want to get really technical, it doesn't have to point to the beginning of the array--it just has to point to part of an array that's followed by at least n items in the array.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Shouldn't it be `std::less_equal()(begin, test)` (note the star) ? – qbolec Aug 20 '19 at 15:05
  • @qbolec: Yes, at least in most cases. I've edited. Thank you. – Jerry Coffin Aug 20 '19 at 16:44
  • 1
    I don't think the conditions of `std::less` are strong enough to guarantee that `in_range` works correctly. Given `int a[10], b[10];`, isn't it legal for the `std::less` ordering to be `a+0, b+0, a+1, b+1, a+2, b+2, ..., a+9, b+9, a+10, b+10` ? – Ben Voigt Oct 06 '22 at 20:16
  • @BenVoigt: and intriguing idea, but I don't think it works. If we look solely at the requirements on `std::less` itself, it could probably be made to work--but would be awfully difficult to fit in with the requirement that arrays be stored contiguously. I'm not feeling ambitious enough to reread that section at the moment to be sure whether it would violate the letter of the requirement, but at the very least, it seems to run solidly contrary to what I'd expect "contiguous storage" to mean. – Jerry Coffin Oct 07 '22 at 03:28
  • @JerryCoffin: But contiguous storage is defined in terms of `std::advance` being simple pointer arithmetic, and has nothing to do with `std::less`. – Ben Voigt Oct 07 '22 at 15:02
2

Undefined behavior in this case usually does not cause a crash, but a meaningless or inconsistent result.

In most modern architectures, subtracting 2 unrelated pointers just computes the difference of addresses divided by the size of the pointed type, approximately this:

    A *p1, *p2;
    ...
    ptrdiff_t diff = ((intptr_t)p2 - (intptr_t)p1) / (intptr_t)sizeof(*p1);

Examples of architectures where the behavior would be unexpected are Intel's 16 bit segmented medium and large models:

  • These models were once prevalent on PCs, before the 386 came along and its 32-bit model.
  • far pointers were stored in 2 parts: a 16-bit segment (or selector in protected mode) and a 16-bit offset.
  • comparing 2 pointers for equality required 2 separate compare instructions and conditional jumps for the segment and the offset.
  • comparing a pointer to NULL was usually optimized as a single comparison of the segment part with 0
  • subtracting 2 pointers and comparing for relative position was performed on the offset part only, making the silent assumption that both pointers pointed to the same array, hence had the same segment.
  • in your example, both objects have automatic storage, so they are both in the same segment, pointed to as SS, but for 2 objects allocated from the heap you could have p <= q && q <= p and p != q at the same time, or p - q == 0 with p != q which is covered by undefined behavior.
chqrlie
  • 131,814
  • 10
  • 121
  • 189