5

I am working with some code that checks if std::vector contains a given element in constant time by comparing its address to those describing the extent of the vector's data. However I suspect that, although it works, it relies on undefined behaviour. If the element is not contained by the vector then the pointer comparisons are not permitted.

bool contains(const std::vector<T>& v, const T& a) {
  return (v.data() <= &a) && (&a < v.data() + v.size());
}

Am I right in believing it is undefined behaviour? If so, is there any way to do the same thing without drastically changing the time complexity of the code?

Richard Forrest
  • 601
  • 1
  • 7
  • 9
  • Did you test your code? Unless you take an element from the vector (which will obviously be contained in the vector) this won't even work: https://godbolt.org/z/hrY1d4fox – mediocrevegetable1 Sep 23 '21 at 07:07
  • 2
    @mediocrevegetable1: Yes it is meant to check for containment by identity, not equality. – Richard Forrest Sep 23 '21 at 07:10
  • somewhere up the line you either took a reference to an element in the vector or to a different `T`, so maybe it is an option to use a `std::pair>` instead of the bare reference in the first place – 463035818_is_not_an_ai Sep 23 '21 at 07:24
  • @463035818-is-not-a-number: Yes that is a possibility. I could keep a record of which vector the reference was obtained from. However it adds a bit of (perhaps!) unnecessary complexity to the code. – Richard Forrest Sep 23 '21 at 07:45

2 Answers2

10

You can use std::less

A specialization of std::less for any pointer type yields the implementation-defined strict total order, even if the built-in < operator does not.


Update:

The standard doesn't guarantee that this will actually work for contains though. If you have say two vectors a and b, the total order is permitted to be &a[0], &b[0], &a[1], &b[1], &a[2], &b[2], ..., i.e., with the elements interleaved.

As pointed out in the comments, the standard only guarantees that std::less yields the implementation-defined strict total order, which is is consistent with the partial order imposed by the builtin operators. However, the standard doesn't guarantee the order of pointers pointing to different objects or arrays. Releated: https://devblogs.microsoft.com/oldnewthing/20170927-00/?p=97095

One interesting thing is that there's a similar usage in Herb Sutter's gcpp library(link). There's a comment saying that it is portable, the library is experimental though.

    //  Return whether p points into this page's storage and is allocated.
    //
    inline
    bool gpage::contains(gsl::not_null<const byte*> p) const noexcept {
        //  Use std::less<> to compare (possibly unrelated) pointers portably
        auto const cmp = std::less<>{};
        auto const ext = extent();
        return !cmp(p, ext.data()) && cmp(p, ext.data() + ext.size());
    }
VainMan
  • 2,025
  • 8
  • 23
  • I understood this to mean that `std::less` simply bypasses any redefinition of the `operator<`. Wouldn't the undefined behaviour of the pointer comparison still remain? – Richard Forrest Sep 23 '21 at 07:36
  • 4
    @RichardForrest `std::less` invokes `operator<` on type T **unless specialized**. It is specialized for pointer type as the quoted text saying. This is also the reason that you can use something like `std::set` without relying on undefined behavior. – VainMan Sep 23 '21 at 07:43
  • That's great. This should work perfectly for my situation then. I was not expecting such a good outcome. Thanks. – Richard Forrest Sep 23 '21 at 07:56
  • 2
    The standard doesn't guarantee that this will actually work for `contains` though. If you have say two vectors `a` and `b`, the total order is permitted to be `&a[0], &b[0], &a[1], &b[1], &a[2], &b[2], ...`, i.e., with the elements interleaved. – T.C. Sep 24 '21 at 00:10
  • @T.C. I don't known much about the standard. That seems correct if I read only sections directly related to std::less. I think that a new [question](https://stackoverflow.com/questions/69310688) is worth it. – VainMan Sep 24 '21 at 06:53
4

Yes, the comparisons as written are not permitted if the reference doesn't reference something that is already an element of the vector.

You can make the behavior defined by casting all pointers to uintptr_t and comparing those. This will work on all architectures with continuous memory (i.e. possibly not old 16-bit x86), although I don't know if the specific semantics are guaranteed.

As a side note, I would always interpret the name contains to be about the value, and thus be very surprised if the semantics are anything other than std::find(v.begin(), v.end(), a) != v.end(). Consider using a more expressive name.

Sebastian Redl
  • 69,373
  • 8
  • 123
  • 157
  • 6
    Comparing integers casted to pointers isn't technically guaranteed to yield the same result as comparing the pointers. The mapping from pointer to integer is implementation defined. – eerorika Sep 23 '21 at 07:19