64

I know that when I use range([start], stop[, step]) or slice([start], stop[, step]), the stop value is not included in the range or slice.

But why does it work this way?

Is it so that e.g. a range(0, x) or range(x) will contain x many elements?

Is it for parallelism with the C for loop idiom, i.e. so that for i in range(start, stop): superficially resembles for (i = start ; i < stop; i++) {?


See also Loop backwards using indices for a case study: setting the stop and step values properly can be a bit tricky when trying to get values in descending order.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
wap26
  • 2,180
  • 1
  • 17
  • 32
  • Closely related: [Numpy Indexing - Questions on Odd Behavior/Inconsistencies](http://stackoverflow.com/questions/9421057/numpy-indexing-questions-on-odd-behavior-inconsistencies) – Sven Marnach Jul 06 '12 at 14:53
  • 1
    Here's a discussion on why Python uses half-open intervals: https://groups.google.com/forum/?fromgroups#!msg/comp.lang.python/xfH2pQCH8iY/aPP7XZJNvwEJ – ecatmur Jul 06 '12 at 15:10
  • Regardless of why they're that way, you can always write your own similar ones that are inclusive if you need that functionality a lot. – martineau Jul 06 '12 at 16:49
  • 33
    Here's Edsger Dijkstra's lovely handwritten explanation of why the half-open zero-based interval convention is the best choice for computer programming: http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF – Russell Borogove Jul 06 '12 at 18:41
  • 2
    I'm too old to care about **why** anymore in this industry. If too many people have to ask why, then you're **probably** dealing with a religious war. What I wish for is an answer to **how** do I easily get the alternate behavior, since (outside of religious wars) reasonable people can and do differ. https://stackoverflow.com/questions/29596045/how-should-i-handle-inclusive-ranges-in-python – dreftymac Dec 05 '19 at 00:47

6 Answers6

54

The documentation implies this has a few useful properties:

word[:2]    # The first two characters
word[2:]    # Everything except the first two characters

Here’s a useful invariant of slice operations: s[:i] + s[i:] equals s.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

I think we can assume that the range functions act the same for consistency.

Toomai
  • 3,974
  • 1
  • 20
  • 22
  • 3
    One thing that tripped me up is that while for array x, x[-1] refers to the last element, x[-2:-1] does not refer to the last 2 elements, but rather just the second-to-last element. For Ruby programmers in particular, this is a common pitfall because you're used to having -1 be the last element and the .. notation is inclusive, i.e. x[-2..-1] returns the last 2 elements. The python colon ':' is actually the ruby triple-dot '...' – farhadf Jun 07 '16 at 21:32
35

Here's the opinion of Guido van Rossum:

[...] I was swayed by the elegance of half-open intervals. Especially the invariant that when two slices are adjacent, the first slice's end index is the second slice's start index is just too beautiful to ignore. For example, suppose you split a string into three parts at indices i and j -- the parts would be a[:i], a[i:j], and a[j:].

[Google+ is closed, so link doesn't work anymore. Here's an archive link.]

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Nigel Tufnel
  • 11,146
  • 4
  • 35
  • 31
  • 1
    This is the only explanation I have seen that makes me feel better about it. This elegance is a non-arbitrary reason that finally gives me some peace. After all, it is called slicing and this makes it clear that the intent was for just that, not just subset selection. Thanks. – Jason Kelley Apr 01 '21 at 19:28
  • This explanation makes me feel a little bit better too; however, for a language designed to be readable, it might still be unforgivable... – user3761340 Oct 04 '21 at 18:56
23

Elegant-ness VS Obvious-ness

To be honest, I thought the way of slicing in Python is quite counter-intuitive, it's actually trading the so called elegant-ness with more brain-processing, that is why you can see that this StackOverflow article has more than 2Ks of upvotes, I think it's because there's a lot of people don't understand it intially.

Just for example, the following code had already caused headache for a lot of Python newbies.

x = [1,2,3,4]
print(x[0:1])
# Output is [1]

Not only it is hard to process, it is also hard to explain properly, for example, the explanation for the code above would be take the zeroth element until the element before the first element.

Now look at Ruby which uses upper-bound inclusive.

x = [1,2,3,4]
puts x[0..1]
# Output is [1,2]

To be frank, I really thought the Ruby way of slicing is better for the brain.

Of course, when you are splitting a list into 2 parts based on an index, the exclusive upper bound approach would result in better-looking code.

# Python
x = [1,2,3,4]
pivot = 2
print(x[:pivot]) # [1,2]
print(x[pivot:]) # [3,4]

Now let's look at the the inclusive upper bound approach

# Ruby
x = [1,2,3,4]
pivot = 2
puts x[0..(pivot-1)] # [1,2]
puts x[pivot..-1] # [3,4]

Obviously, the code is less elegant, but there's not much brain-processing to be done here.

Conclusion

In the end, it's really a matter about Elegant-ness VS Obvious-ness, and the designers of Python prefer elegant-ness over obvious-ness. Why? Because the Zen of Python states that Beautiful is better than ugly.

Christoph Pader
  • 158
  • 3
  • 13
Wong Jia Hau
  • 2,639
  • 2
  • 18
  • 30
  • 4
    I will agree that zero based indexes (ZBI) are *at first* not obvious. I recall many (many - no *MANY*) decades ago being a little confused by ZBI when I first learned to program. The problem wasn't the exclusive upper bound concept but rather the fact that the use of the concept wasn't explained. But once I figured this out its use became obvious! So perhaps "obvious" is in the eye of the beholder, or said another (more elegant :-) way: the obvious is that which is never seen until someone expresses it simply. It would be nice if Python textbooks and tutorials expressed this simply. – Jon Spencer Aug 05 '19 at 19:11
  • I like more one-based indexing and closed intervals .. it's stupid simple and it's real indexing. Zero based THING with half-open intervals are just OFFSETS. Good for some situations like pointer arithmetic etc. Bad for normal use of arrays (and most high level pr. langs cant even do pointer arithmetic so it's just needless headache) – JsonKody Aug 24 '21 at 11:28
  • If you want lets say second to fourth from arr = [1,2,3,4,5,6,7,8] .. you would slice arr[2:4] OR if you want third element it would be arr[3]. Not arr[2] OR arr[1:4] where 1 means second and 4 means fifth but not included -> thats not elegant, thats stupid – JsonKody Aug 24 '21 at 11:32
  • @JsonKody Sounds to me you're actually talking about _counting_, not indexing. But if you "like more one-based indexing and closed intervals"… use BASIC. Better yet, Pascal; it will let you decide the starting index of each array separately. Zero? One? -57? All valid choices. – JuSTMOnIcAjUSTmONiCAJusTMoNICa Nov 17 '21 at 14:40
  • Wouldn't that contradict the Explicit is better than implicit Zen rule? – Rafs Jun 16 '22 at 10:15
  • I want just to add that in ruby you can do: `x.each_slice(2){|l|puts l.to_s}` – phranz Sep 22 '22 at 11:52
12

A bit late to this question, nonetheless, this attempts to answer the why-part of your question:

Part of the reason is because we use zero-based indexing/offsets when addressing memory.

The easiest example is an array. Think of an "array of 6 items" as a location to store 6 data items. If this array's start location is at memory address 100, then data, let's say the 6 characters 'apple\0', are stored like this:

memory/
array      contains
location   data
 100   ->   'a'
 101   ->   'p'
 102   ->   'p'
 103   ->   'l'
 104   ->   'e'
 105   ->   '\0'

So for 6 items, our index goes from 100 to 105. Addresses are generated using base + offset, so the first item is at base memory location 100 + offset 0 (i.e., 100 + 0), the second at 100 + 1, third at 100 + 2, ..., until 100 + 5 is the last location.

This is the primary reason we use zero based indexing and leads to language constructs such as for loops in C:

for (int i = 0; i < LIMIT; i++)

or in Python:

for i in range(LIMIT):

When you program in a language like C where you deal with pointers more directly, or assembly even more so, this base+offset scheme becomes much more obvious.

Because of the above, many language constructs automatically use this range from start to length-1.

You might find this article on Zero-based numbering on Wikipedia interesting, and also this question from Software Engineering SE.

Example:

In C for instance if you have an array ar and you subscript it as ar[3] that really is equivalent to taking the (base) address of array ar and adding 3 to it => *(ar+3) which can lead to code like this printing the contents of an array, showing the simple base+offset approach:

for(i = 0; i < 5; i++)
   printf("%c\n", *(ar + i));

really equivalent to

for(i = 0; i < 5; i++)
   printf("%c\n", ar[i]);
Solomon Ucko
  • 5,724
  • 3
  • 24
  • 45
Levon
  • 138,105
  • 33
  • 200
  • 191
  • That might explain why range(num) does not include the upper limit, as you could say the num is only the amount of range which is 0 based. It does not explain why range(lower,upper) does not include it as we specifically requested that upper limit – CodeMonkey Aug 06 '18 at 08:57
  • @YonatanNir It's the same reasoning, and for consistency. Otherwise, you'd have functions with the same name that differ in behavior based on whether default values have been provided or not. I.e., for increasing ranges, range(num) is really the same as range(0, num) and range(0, 1, num). It's easier all around (for API developers, and those using the API) to have consistent behavior. – Levon Aug 06 '18 at 12:13
10

Here is another reason why an exclusive upper bound is a saner approach:

Suppose you wished to write a function that applies some transform to a subsequence of items in a list. If intervals were to use an inclusive upper bound as you suggest, you might naively try writing it as:

def apply_range_bad(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end]"""
     left = lst[0 : start-1]
     middle = lst[start : end]
     right = lst[end+1 :]
     return left + [transform(i) for i in middle] + right

At first glance, this seems straightforward and correct, but unfortunately it is subtly wrong.

What would happen if:

  • start == 0
  • end == 0
  • end < 0

? In general, there might be even more boundary cases that you should consider. Who wants to waste time thinking about all of that? (These problems arise because by using inclusive lower and upper bounds, there no inherent way to express an empty interval.)

Instead, by using a model where upper bounds are exclusive, dividing a list into separate slices is simpler, more elegant, and thus less error-prone:

def apply_range_good(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end)"""
     left = lst[0:start]
     middle = lst[start:end]
     right = lst[end:]
     return left + [transform(i) for i in middle] + right

(Note that apply_range_good does not transform lst[end]; it too treats end as an exclusive upper-bound. Trying to make it use an inclusive upper-bound would still have some of the problems I mentioned earlier. The moral is that inclusive upper-bounds are usually troublesome.)

(Mostly adapted from an old post of mine about inclusive upper-bounds in another scripting language.)

jamesdlin
  • 81,374
  • 13
  • 159
  • 204
-1

This upper bound exclusion improves code understanding greatly. I hope it comes to other languages.

telepinu
  • 89
  • 6