0

I get the idea of the slicing operator in Python, but I am kind of confused at "stop".

For instance:

lst = [1,2,3,4,5]
print(lst[0:4])

I think the answer should be [1,2,3,4,5], since it will stop on index 4, which is element 5. However, the correct answer will be [1,2,3,4].

What is the explanation?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Graz
  • 3
  • 2
  • 4
    `start` is the first item you want; `stop` is the first item you *don't* want. – kindall Feb 07 '23 at 19:59
  • The slice goes from `start` (inclusive) to `stop` (exclusive). This is just a design decision for the standard library. – Michael Butscher Feb 07 '23 at 20:00
  • 7
    The rationale is that `s[x:y] + s[y:z] == s[x:z]`, something that wouldn't be true if `s[x:y]` and `s[y:z]` both included `s[y]`. Half-open intervals are more convenient to work with. – chepner Feb 07 '23 at 20:02
  • 1
    BTW, `list` is the name of a type and should not be used as a variable name. And `print` probably needs parentheses. Whatever, we get the meaning. – Friedrich Feb 07 '23 at 20:05
  • @Friedrich +1, I have edited the question to use a different variable name and to fix the `print()` call. – shadowtalker Feb 07 '23 at 21:28

2 Answers2

2

Python ranges are said to be "exclusive", because the element with index stop is excluded from the results.

This behavior was chosen because exclusive (rather than inclusive) upper bounds work nicely with 0-indexed sequences, which is what Python uses in all of its core data sequence types (list, tuple, str, bytes, bytearray, and array.array).

It's one thing to memorize this, but it's another to understand why it makes sense!

In general, 0-indexing of arrays invites you to treat any position in the array as a cursor located between elements of the array. This design for arrays seems a little unusual at first, but it's actually a very ergonomic way to do things.

Consider the array:

values:    a   b   c   d   e   f
indexes:   0   1   2   3   4   5

We are taught that we count from 0, so the first position is index 0, the second position is index 1, and so on up to the final position, which is index len(data) - 1.

We can visualize this pattern as describing the position of a cursor located before the element of interest:

index 0: | a   b   c   d   e   f
index 1:   a | b   c   d   e   f
index 2:   a   b | c   d   e   f
index 3:   a   b   c | d   e   f
index 4:   a   b   c   d | e   f
index 5:   a   b   c   d   e | f

Numbering the cursor positions as 1 less than the element position is a deliberate reflection of this mental model.

There are historical reasons to have designed arrays this way, relating to memory addresses and pointers. But unless you are programming in C or C++, then you can mostly ignore those reasons, because it also works nicely as an abstract model of sequences. You can see here for a famous aesthetic argument in favor of 0-indexing.

Once we accept 0-indexing as comfortable and natural, this in turn affects how we think about ranges, because now the index of the final element in the array is index len(data) - 1. So if we want to select the entire array using an inclusive upper bound, we would have to write our range as 0 : len(data) - 1. That's ugly and clunky, so using exclusive upper bounds allows us to write 0 : len(data) instead. Now the upper bound of the range nicely coincides with the length of the data, while allowing us to use 0-indexing. This behavior appears in the range function, : syntax, and the slice class that : syntax represents.

Furthermore, the exclusive upper bound reinforces the idea that a 0-based index is a cursor located before the current element.

Consider a range that selects c, d, and e. If you draw a box around the selected values, you'll notice that the rightmost/uppermost "edge" of the "box" is actually past the final value:

values:    a   b | c   d   e | f
indexes:   0   1 | 2   3   4 | 5
                 |-----------|

If we think in terms of cursors, the upper bound is located between e and f, which we know is index 5. We do not go past the upper bound; we stop before it. Therefore it's completely natural to write this range as 2:5 -- we start *before index 2, including indices 3 and 4, and then stop before we reach index 5. If we wrote the range as 2:4, that would lead to an inconsistent interpretation of indices as cursors, which would be confusing.

Finally, user chepner pointed out in a comments that exclusive ranges lead to an elegant property of slices of sequences. For sequence x and indices a < b < c, then x[a:b] + x[b:c] == x[a:c]. This property is not preserved when ranges are exclusive. As an exercise, use the "cursor" model to convince yourself that this property is preserved when ranges are exclusive and broken when ranges are inclusive.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
shadowtalker
  • 12,529
  • 3
  • 53
  • 96
0

This visualization may help:

Visualization of list data

Here's a very useful webpage on Python list slicing.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ben the Coder
  • 539
  • 2
  • 5
  • 21