C++: Binary search tree end() iterator

Question

I have basic (no randomization, ordering etc) implementation of the BST. I want add iterators implementations and make the BST suitable for the ranged based for-loop. So I need begin(), end() member fucnctions and iterator incrementing.

I understand what begin() should do -return the iterator to the bottom-left-most node, and this thread discusses different possibilites for traversing the BST (=incrementing the iterator)

But the end() is supposed to give the iterator to the one-past-the-last element. And this is the actual question, that I don't understand, what is the meaning of that in the context of a BST?

How does `std::map` implement the `end()` iterator? Maybe you should emulate that behavior. — PaulMcKenzie, Nov 20 '14 at 15:49
@PaulMcKenzie - who's implementation of `std::map`? I don't know that different implementations use a different design, but I'm pretty sure the standard doesn't dictate a particular implementation (it doesn't even mandate the use of red-black trees IIRC) and I wouldn't be surprised if different choices were made. Of course there are probably good reasons, but also standard library sources probably aren't that easy to read. Not a criticism of style but standard library implementations have to cope with a lot of complications. — , Nov 20 '14 at 16:28

score 3 · Answer 1 · answered Nov 20 '14 at 15:48

3

The end iterator doesn't necessarily have to be one past the last element (that makes sense for vectors, but less so for trees eg.). It has to just be an iterator that can clearly be identified as not a valid iterator used for indicating that the end of the data structure is reached.

Practically, this can be done in several ways, depending on how your iterator refers to what it's pointing to. If it uses a pointer to a tree node eg., then a null pointer can be used for the end iterator.

answered Nov 20 '14 at 15:48

Sander De Dycker

16,053
1
35
40

True in a way, but I think you're thinking too much about implementation rather than the abstraction. The ordering isn't about addresses. The conventions of the interface order the iterators relative to each other, including `end`. For example if the iterator type is at least bidirectional, decrementing the `end` iterator must give an iterator to the final element. And of course the name reflects that intent - it's the `end` bound for the container. Of course it's more awkward for `unordered_map` etc, which have no logical ordering other than that defined by the hash and collision handling. – Nov 20 '14 at 16:05
1

@Steve314 : that's correct - I did interpret the question to be about implementation. Maybe that interpretation is wrong. In any case, decrementing the iterator just requires special treatment of the end iterator to make things work properly. – Sander De Dycker Nov 20 '14 at 16:42

kec · Answer 2 · 2014-11-20T18:00:10.300

1

A very simple scheme that uses two extra pointers-worth of memory is to simply overlay a doubly-linked, circular list on top of the BST. Your end() iterator then simply points to a sentinel node. It also makes your iterator increment/decrement very simple.

BST::iterator &
BST::iterator::operator++() {
  n = n->next;
  return *this;
}

etc. Note that using a sentinel like this means that the end iterator requires no special treatment. You can decrement it and get exactly the correct behavior.

edited Nov 20 '14 at 18:00

answered Nov 20 '14 at 17:48

kec

2,099
11
17

This approach is more common for multiway trees - the two pointers overhead is per-node, which can be quite a lot. A common approach with binary trees is to use [threaded binary trees](https://en.wikipedia.org/wiki/Threaded_binary_tree), which use the child pointers as in-order successor/predecessor pointers where there are no children. There's some overhead still to indicate which kind of link it is - child or succ/pred - but for balanced binary trees (red-black, AVL etc) one byte can handle all the jobs anyway. – Nov 20 '14 at 23:26
Of course that's not as simple, plus there's no obvious place for a sentinel in a threaded binary tree, but it's common enough for binary trees to have "fingers" - pointers to the in-order first and last nodes - anyway, avoiding the need for an O(log n) search for those frequently-referenced items. – Nov 20 '14 at 23:29

score 0 · Answer 3 · answered Nov 20 '14 at 16:24

Despite my comment, Sander De Dycker has the right idea. I have another way to think about it.

All containers that support iterators have a logical ordering. For vector the ordering is based on how the inserts were done - the index/subscript ordering. For map and set it's based on the key ordering. For multimap and multiset it's a bit of both. For unordered_map etc the claim is very tenuous, but I can still argue about hash algorithms and collision handling.

In a logical ordering, you can refer to ordered elements, but sometimes it makes sense to refer to the boundaries between each element. Logically (and in some cases even for the implementation) this works out fairly conveniently...

|     |     |     |     |     |     |     |     |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| |0| | |1| | |2| | |3| | |4| | |5| | |6| | |7| |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
|     |     |     |     |     |     |     |     |
0     1     2     3     4     5     6     7     8

You decide where the zero "bound" goes independently of where the zero item goes, but you always get a simple addition/subtraction relationship. If the least bound is numbered the same as the least element, the last bound is numbered one more than the last element. Hence end as one past the final element.

In a binary tree implementation, each node can be considered to have two bounds - one either side of the element. In this scheme, every bound except begin and end occurs twice. You can represent bound 1 using the RHS of element 0 or the LHS or element 1. So in principle you can use a node pointer and a flag. Rather than have two representations for most bounds, though, you'll probably choose the most convenient one where possible - the one where you're not just referring to the right bound but also referring to the element you want to see when you dereference. That means the flag will only be set when referring to end, in which case you shouldn't support dereference anyway.

IOW following through this logic tells you that you don't really need to follow through this logic, though I think it's still a useful mental model. All you really need is an identifiable representation for end. Perhaps it's useful for that representation to include a pointer to the final pointer (as a starting point for e.g. decrementing that iterator). Perhaps there are situations where it's convenient to have pseudo-iterators internally that recognize the two equivalent bounds as distinct.

Similar but slightly different models and choices arise thinking about e.g. multiway trees where each node contains an array of elements.

Basically I think it's useful to mentally recognise bound positions as distinct but related to item positions, but that mental model shouldn't constraint your implementation choices - it may inspire alternatives but it's just a mental model.

C++: Binary search tree end() iterator

3 Answers3