1

In a comment by @MarcvanLeeuwen to another Q&A, it was suggested that the following is undefined behavior (UB):

template<class FwdIt, class T>
FwdIt find_before(FwdIt before_first, FwdIt last, T const& value)
{
    return std::adjacent_find(before_first, last, 
        [&](auto const& /* L */, auto const& R) { 
        // note that the left argument of the lambda is not evaluated
        return R == value; 
    });
}

auto list = std::forward_list<int> { 1, 2 };
auto it = find_before(list.before_begin(), list.end(), 1); // undefined behavior!?

The UB comes from the fact that the BinaryPredicate that is supplied to std::adjacent_find will dereference two adjacent iterators at a time, the first pair being list.before_begin() and list.begin(). Since before_begin() is not dereferencable, this could entail UB. On the other hand, the left argument is never used inside the lambda. One might argue that under the "as-if" rule it is unobservable whether the derefence actually took place, so that an optimizing compiler might elide it altogether.

Question: what does the Standard say about dereferencing an undereferencable iterator it, when the expression *it is not actually used and could be optimized away? (or more generally, is it UB when the offending expression is not used?)

NOTE: the offending code is easy to remedy by special casing the first iterator (as is done in the updated original Q&A).

Community
  • 1
  • 1
TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • So, what happens if the implementation uses checked iterators? I'd guess that dereferencing `before_begin` throws an exception (can throw).. – dyp Jun 27 '14 at 13:07
  • *"or more generally, is it UB when the offending expression is not used?"* As far as I can see, the result of the expression is not used. The expression itself is evaluated. If the lambda is not inlined, it even need to be bound to the reference. – dyp Jun 27 '14 at 13:09
  • @dyp what I really want is an overload for `std::adjacent_find` with a UnaryPredicate that takes the second argument ;-) – TemplateRex Jun 27 '14 at 13:13
  • 1
    [iterator.requirements.general]/5 "Values of an iterator `i` for which the expression `*i` is defined are called dereferenceable." -- The iterator returned by `before_begin` is not dereferenceable. Hence, `*i` itself is undefined. – dyp Jun 27 '14 at 13:13
  • 1
    I would say it is UB exactly for the reason that the Standard does not require dead code (or unobservable code) elimination and an implementation could be checked. Therefore, it needs to allow throwing exceptions or asserting, terminating.. if that expression is evaluated. For example, `void foo() { double d = 1.0 / 0; }` – dyp Jun 27 '14 at 13:19
  • 1
    @dyp It has nothing to do with checked iterators. The standard is explicit: in every case where an algorithm takes a range `[i,j)`, all of the elements in the range `[i,j)` must be dereferenceable, or there is undefined behavior. – James Kanze Jun 27 '14 at 13:26
  • @dyp [intro.execution]/1, footnote 5: "5) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation **need not evaluate part of an expression if it can deduce that its value is not used** and that no side effects affecting the observable behavior of the program are produced." – TemplateRex Jun 27 '14 at 13:27
  • 2
    @TemplateRex *need not*, not *must not*. As I said: The Standard does not *require* dead code elimination (it merely *allows* it). UB is about portability/consistency, and portability is not guaranteed if it's merely allowed but not required. – dyp Jun 27 '14 at 13:28
  • 2
    @JamesKanze It does have to with checked iterators if you look further than "standard says no". Checked iterators are a reason *why* the behaviour cannot realistically be defined if any iterator in the given range cannot be dereferenced. –  Jun 27 '14 at 13:28
  • @TemplateRex You bet. [g++ @ `-O0`](http://coliru.stacked-crooked.com/a/2c46476d2900d519) [g++ @ `-O3`](http://coliru.stacked-crooked.com/a/6fe36f96c8faec9a) – dyp Jun 27 '14 at 13:34
  • @TemplateRex Keep in mind that always returning 1 is valid if the behaviour is undefined. Undefined behaviour does not require a diagnostic, crash, or anything else. –  Jun 27 '14 at 13:35
  • @hvd But not the only reason. I've worked on systems where all allocations when to the OS, and generating an address one before the allocation would cause a crash. This particular system was the motivating reason for the rules concerning pointers, which iterators have just adopted. If you do something like `int* p = malloc(sizeof(int) * 5); --p;` in C, you have undefined behavior as well. – James Kanze Jun 27 '14 at 14:00
  • @dyp I would say given [situations like this one](http://stackoverflow.com/a/24297811/1708801) that the optimizer is way more likely to work against you then with you with respect to undefined behavior. – Shafik Yaghmour Jun 27 '14 at 14:08
  • @JamesKanze Sure, but this question is about a rare type where the standard does require a `before_begin()` function, so there cannot be a problem that the iterator itself is not valid. –  Jun 27 '14 at 14:14
  • @hvd His function is a template. It can be called on any container. – James Kanze Jun 27 '14 at 14:19
  • @ShafikYaghmour Yeah, I know. But I find it easier to understand without trying to construct an optimization that could break it. Of course, checked iterators is not the only reason why this *should* be UB; but IMHO they explain it reasonably well. Whether the "strange" behaviour occurs because of additional checks, the architecture or optimizations doesn't really care here; additional checks are just something I can more easily reason about. – dyp Jun 27 '14 at 14:24
  • @JamesKanze I don't understand. You can call it for any container, but if the container does not have a `before_begin()` function, you cannot pass the result of `before_begin()` to this `find_before()` template function. And if for such a container, you attempt to decrement the result of `begin()`, the undefined behaviour is in the caller, not in this function. –  Jun 27 '14 at 14:29

3 Answers3

1

before_begin() yields a non-dereferenceable iterator; one for which applying unary * has undefined behavior, so certainly the above code has undefined behavior:

1.3.24 [defns.undefined]

undefined behavior
behavior for which this International Standard imposes no requirements

Now, it is certainly true that for some implementations of the library, it would be possible for the above code to have defined behavior; for example, in a simple implementation of forward_list, operator* on its iterator could have the effect of forming a T& reference to an uninitialised area of aligned storage:

template<typename T> struct forward_list {
    struct Node {
        Node* next = nullptr;
        typename aligned_storage<sizeof(T), alignof(T)>::type buf;
    };
    struct iterator {
        Node* node;
        T& operator*() { return *static_cast<T*>(static_cast<void*>(node->buf)); }
    };
    Node head;
    iterator before_begin() { return {&head}; }
};

In this scenario *(list.before_begin()) has defined behavior, as long as the result is only used in the "limited ways" allowed by 3.8p6.

However, any small change to this scenario could result in *(list.before_begin()) becoming undefined; for example, optimising for space an implementation could use a Node_base that omits storage and derive from it for the concrete list nodes. Or perhaps the aligned storage could contain an object wrapping T, so accessing the wrapped T falls foul of 3.8p6. Alternatively, a debug implementation could check for dereferencing the before-begin iterator and terminate your program.

By the as-if rule, the implementation is permitted to eliminate evaluations that have no visible side effects, but that doesn't result in your code having defined behavior. If the implementation debug-checks the dereference, program termination is certainly a side effect so is not eliminable.

More interestingly, if an operation with undefined behavior occurs on the way to forming a T& reference, then an optimising compiler is likely to use this to infer that the code paths leading up to the undefined behavior cannot occur. Per the reference implementation of adjacent_find, eliminating the code paths leading to the offending *first give the result that find_before will always return last.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
1

23.3.4.3 forward_list iterators [forwardlist.iter]

iterator before_begin() noexcept;
const_iterator before_begin() const noexcept;
const_iterator cbefore_begin() const noexcept;

1 Returns: A non-dereferenceable iterator that, when incremented, is equal to the iterator returned by begin().
3 Remarks: before_begin() == end() shall equal false.

With sentence 3 in mind, if you call adjacent_find like you do it will dereference the first iterator if there is at least one element in the list (elem1: before_begin(), elem2: begin(), end: end()).

Sentence 1 says that the iterator returned by before_begin is not dereferencable. Not. In any case. Without any if or when. It is undefined behavior to dereference the iterator, the standard is not saying you can dereference the iterator and throw away the return value of *before_begin(). Keep in mind that adjacant_find needs to dereference the iterator so it can pass whatever the dereferencing returns to your predicate.

As always with undefined behavior the compiler is free to do whatever he wants. If the compiler sees in an optimization stage that he can inline your predicate and if the thinks that the left iterator does not need to be dereferenced because you do not use the returned value, he might generate code that just does so and hence does not crash. Generating code that works like you expect is one possibility in the case of undefined behavior, but you cannot count on it.

BTW, why would you want to adjacent_find with a predicate that looks only at the right side? Wouldn't you use a findthen?

Community
  • 1
  • 1
Werner Henze
  • 16,404
  • 12
  • 44
  • 69
  • thanks, the reasoning is the same as with the answer of ecatmur. The use case is explained in the linked question: it is to use `erase_after` on the iterator before the element found (so that the element found is removed). – TemplateRex Jun 30 '14 at 12:01
0

There is clearly undefined behavior here, supposing the argument names mean what they seem to mean, but it is in the callers code. There is no such thing as an iterator before first, and just attempting to create one is undefined behavior. The undefined behavior doesn't come from attempting to dereference the iterator; it comes from its very existance. (Of course, std::adjacent_find will dereference it, before calling your lambda, so if the iterator exists, and is not dereferenceable, that is also undefined behavior.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • Huh? `std::forward_list` has a *member function* `before_begin` which returns that iterator, as it is a singly-linked list. – dyp Jun 27 '14 at 13:17
  • `std::forward_list` has `before_begin` – TemplateRex Jun 27 '14 at 13:17
  • That's a special case introduced in C++11, a special exception to the rules for iterators. The general rules for iterators don't allow it, and none of the other containers allow it. – James Kanze Jun 27 '14 at 13:22