Why are Standard iterator ranges [begin, end) instead of [begin, end]?

Question

Why does the Standard define end() as one past the end, instead of at the actual end?

I'm guessing "because that's what the standard says" won't cut it, right? :) — Luchian Grigore, Apr 01 '12 at 09:41
@LuchianGrigore: Of course not. That would erode our respect for the (people behind) the standard. We should expect that there's a *reason* for the choices made by the standard. — Kerrek SB, Apr 01 '12 at 09:47
I guess, this explanation also deserves your attention: [One Past the End](http://gcc.gnu.org/onlinedocs/libstdc++/manual/iterators.html#iterators.predefined.end) — SChepurin, Apr 01 '12 at 11:04
In short, computers don't count like people. But if you're curious as to why people don't count like computers, I recommend [The Nothing that Is: A Natural History of Zero](http://www.oup.com/us/companion.websites/0195128427/) for an in-depth look at the trouble humans had discovering that there is a number which is one less than one. — John McFarlane, Apr 01 '12 at 19:51
Because there's only one way to generate "last one", it is often not cheap because it has to be real. Generating "you fell off the end of the cliff" is *always* cheap, many possible representations will do. (void*)"ahhhhhhh" will do fine. — Hans Passant, Apr 01 '12 at 23:55
This is one of those rare questions where it *is* appropriate to talk about "the STL". The `[begin, end)` convention comes from the original HP/SGI version of the STL. — MSalters, Apr 02 '12 at 13:02
@MSalters: I believe that it's present in the C Standard, too, where it is legal to take the address of one-past-the-end of an array, which is AFAIK older than the STL and may have origins even earlier. — Puppy, Apr 02 '12 at 13:11
You know, I asked this question in a different form and got a very poor showing of results. It's all in how you ask I guess. http://stackoverflow.com/questions/8441749/representing-intervals-or-ranges — Omnifarious, Apr 03 '12 at 05:29
Because you want to be able to specify the empty sequence like `[a,a)` and not `[a,a-1]`. The later is just plain ugly. — Thomas Ahle, Apr 24 '12 at 10:07

score 297 · Accepted Answer · edited Jul 04 '17 at 10:41

297

The best argument easily is the one made by Dijkstra himself:

You want the size of the range to be a simple difference end − begin;
including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n₀, n₁)[n₁, n₂)[n₂,n₃). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

edited Jul 04 '17 at 10:41

Toby Speight

27,591
48
66
103

answered Apr 01 '12 at 09:45

Kerrek SB

464,522
92
875
1,084

2

The typical C for loop iterating over an array of size N is "for(i=0;i – Krazy Glew Apr 05 '12 at 16:31
3

@KrazyGlew: I didn't put types in my loop example deliberately. If you think of `begin` and `end` as `int`s with values `0` and `N`, respectively, it fits perfectly. Arguably, it's the `!=` condition that's more natural than the traditional `<`, but we never discovered that until we started thinking about more general collections. – Kerrek SB Apr 05 '12 at 20:20
4

@KerrekSB: I agree that "we never dscovered that [!= is better] until we started thinking about more general collections." IMHO that is one of the things Stepanov deserves credit for - speaking as someone who tried to write such template libraries before the STL. However, I'll argue about "!=" being more natural - or, rather, I'll argue that != has probably introduced bugs, that < would catch. Think for(i=0;i!=100;i+=3)... – Krazy Glew Apr 06 '12 at 21:39
@KrazyGlew: Your last point is somewhat off-topic, since the sequence {0, 3, 6, ..., 99} is not of the form that the OP asked about. If you wanted it to be thus, you should write a `++`-incrementable iterator template `step_by<3>`, which would then have the originally advertised semantics. – Kerrek SB Apr 08 '12 at 11:00
@KrazyGlew Even if < would sometime hide a bug, **it is a bug anyway**. If someone use `!=` when he should use `<`, then **it is** a bug. By the way, that king of error is easy to find with unit testing or assertions. – Phil1970 May 12 '17 at 03:55

celtschk · Answer 2 · 2015-09-17T17:03:05.593

Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^               ^
   |               |
 begin            end

Obviously begin points to the beginning of the sequence, and end points to the end of the same sequence. Dereferencing begin accesses the element A, and dereferencing end makes no sense because there's no element right to it. Also, adding an iterator i in the middle gives

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
 begin     i      end

and you immediately see that the range of elements from begin to i contains the elements A and B while the range of elements from i to end contains the elements C and D. Dereferencing i gives the element right of it, that is the first element of the second sequence.

Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:

   +---+---+---+---+
   | D | C | B | A |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
rbegin     ri     rend
 (end)    (i)   (begin)

I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i (which I've named ri) still points in between elements B and C. However due to reversing the sequence, now element B is on the right to it.

This is IMHO the best answer, though I think it might be better illustrated if the iterators pointed at numbers, and the elements were between the numbers (the syntax `foo[i]`) is a shorthand for the item immediately *after* position `i`). Thinking about it, I wonder if it might be useful for a language to have separate operators for "item immediately after position i" and "item immediately before position i", since a lot of algorithms work with pairs of adjacent items, and saying "The items on either side of position i" may be cleaner than "The items at positions i and i+1". — supercat, May 22 '15 at 18:01
@supercat: The numbers were not supposed to indicate iterator positions/indices, but to indicate the elements themselves. I'll replace the numbers with letters to make that clearer. Indeed, with the numbers as given, `begin[0]` (assuming a random access iterator) would access element `1`, since there's no element `0` in my example sequence. — celtschk, Sep 17 '15 at 17:00
Why is the word "begin" used rather than "start"? After all, "begin" is a verb. — user1741137, Oct 01 '19 at 11:42
@user1741137 I think "begin" is meant to be the abbreviation of "beginning" (which now makes sense). "beginning" being too long, "begin" sounds like a nice fit. "start" would be conflicting with the verb "start" (for example when you have to define a function `start()` in your class for starting a specific process or whatever, it would be annoying if it conflicts with an already existing one). — Fareanor, Jun 24 '20 at 12:50

Alok Save · Answer 3 · 2012-04-01T14:54:12.457

75

Why does the Standard define end() as one past the end, instead of at the actual end?

Because:

It avoids special handling for empty ranges. For empty ranges, begin() is equal to end() &
It makes the end criterion simple for loops that iterate over the elements: The loops simply continue as long as end() is not reached.

edited Apr 01 '12 at 14:54

answered Apr 01 '12 at 09:42

Alok Save

202,538
53
430
533

score 65 · Answer 4 · edited Jun 20 '20 at 09:12

Because then

size() == end() - begin()   // For iterators for whom subtraction is valid

and you won't have to do awkward things like

// Never mind that this is INVALID for input iterators...
bool empty() { return begin() == end() + 1; }

and you won't accidentally write erroneous code like

bool empty() { return begin() == end() - 1; }    // a typo from the first version
                                                 // of this post
                                                 // (see, it really is confusing)

bool empty() { return end() - begin() == -1; }   // Signed/unsigned mismatch
// Plus the fact that subtracting is also invalid for many iterators

Also: What would find() return if end() pointed to a valid element?
Do you really want another member called invalid() which returns an invalid iterator?!
Two iterators is already painful enough...

Oh, and see this related post.

Also:

If the end was before the last element, how would you insert() at the true end?!

Ken Bloom · Answer 5 · 2012-04-01T14:05:22.377

The iterator idiom of half-closed ranges [begin(), end()) is originally based on pointer arithmetic for plain arrays. In that mode of operation, you would have functions that were passed an array and a size.

void func(int* array, size_t size)

Converting to half-closed ranges [begin, end) is very simple when you have that information:

int* begin;
int* end = array + size;

for (int* it = begin; it < end; ++it) { ... }

To work with fully-closed ranges, it's harder:

int* begin;
int* end = array + size - 1;

for (int* it = begin; it <= end; ++it) { ... }

Since pointers to arrays are iterators in C++ (and the syntax was designed to allow this), it's much easier to call std::find(array, array + size, some_value) than it is to call std::find(array, array + size - 1, some_value).

Plus, if you work with half-closed ranges, you can use the != operator to check for the end condition, becuase (if your operators are defined correctly) < implies !=.

for (int* it = begin; it != end; ++ it) { ... }

However there's no easy way to do this with fully-closed ranges. You're stuck with <=.

The only kind of iterator that supports < and > operations in C++ are random-access iterators. If you had to write a <= operator for every iterator class in C++, you'd have to make all of your iterators fully comparable, and you'd fewer choices for creating less capable iterators (such as the bidirectional iterators on std::list, or the input iterators that operate on iostreams) if C++ used fully-closed ranges.

score 9 · Answer 6 · edited May 17 '12 at 20:12

With the end() pointing one past the end, it is easy to iterate a collection with a for loop:

for (iterator it = collection.begin(); it != collection.end(); it++)
{
    DoStuff(*it);
}

With end() pointing to the last element, a loop would be more complex:

iterator it = collection.begin();
while (!collection.empty())
{
    DoStuff(*it);

    if (it == collection.end())
        break;

    it++;
}

score 0 · Answer 7 · answered Nov 27 '14 at 15:34

0

If a container is empty, begin() == end().
C++ Programmers tend to use != instead of < (less than) in loop conditions, therefore having end() pointing to a position one off-the-end is convenient.

answered Nov 27 '14 at 15:34

Andreas DM

10,685
6
35
62

Why are Standard iterator ranges [begin, end) instead of [begin, end]?

7 Answers7

Also:

Linked

Related