243

I was going through the exercises in Ruby Koans and I was struck by the following Ruby quirk that I found really unexplainable:

array = [:peanut, :butter, :and, :jelly]

array[0]     #=> :peanut    #OK!
array[0,1]   #=> [:peanut]  #OK!
array[0,2]   #=> [:peanut, :butter]  #OK!
array[0,0]   #=> []    #OK!
array[2]     #=> :and  #OK!
array[2,2]   #=> [:and, :jelly]  #OK!
array[2,20]  #=> [:and, :jelly]  #OK!
array[4]     #=> nil  #OK!
array[4,0]   #=> []   #HUH??  Why's that?
array[4,100] #=> []   #Still HUH, but consistent with previous one
array[5]     #=> nil  #consistent with array[4] #=> nil  
array[5,0]   #=> nil  #WOW.  Now I don't understand anything anymore...

So why is array[5,0] not equal to array[4,0]? Is there any reason why array slicing behaves this weird when you start at the (length+1)th position??

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Pascal Van Hecke
  • 4,516
  • 3
  • 19
  • 18
  • 3
    See also [Why does array.slice behave differently for (length, n)](http://stackoverflow.com/questions/3219229/why-does-array-slice-behave-differently-for-length-n) – Phrogz Sep 08 '11 at 12:40
  • looks like the first number is the index to start at, second number is how many elements to slice – austin Jul 24 '14 at 21:59

10 Answers10

196

Slicing and indexing are two different operations, and inferring the behaviour of one from the other is where your problem lies.

The first argument in slice identifies not the element but the places between elements, defining spans (and not elements themselves):

  :peanut   :butter   :and   :jelly
0         1         2      3        4

4 is still within the array, just barely; if you request 0 elements, you get the empty end of the array. But there is no index 5, so you can't slice from there.

When you do index (like array[4]), you are pointing at elements themselves, so the indices only go from 0 to 3.

sawa
  • 165,429
  • 45
  • 277
  • 381
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • 9
    A good guess unless this is backed up by the source. Not being snarky, I'd be interested in a link if any just to explain the "why" like the OP and other commenters are asking. Your diagram makes sense except Array[4] is nil. Array[3] is :jelly. I would expect Array[4,N] to be nil but it's [] like the OP says. If it's a place, it's a pretty useless place because Array[4, -1] is nil. So you can't do anything with Array[4]. – squarism Dec 21 '10 at 22:15
  • 5
    @squarism I just got confirmation from Charles Oliver Nutter (@headius on Twitter) that this is the correct explanation. He's a big-time JRuby dev, so I'd consider his word pretty authoritative. – Hank Gay Apr 20 '11 at 00:52
  • 19
    The following is the justification for this behavior: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/380637 – Matt Briançon Aug 16 '11 at 16:17
  • 4
    Correct explanation. Similar discussions on ruby-core: http://redmine.ruby-lang.org/issues/4245 , http://redmine.ruby-lang.org/issues/4541 – Marc-André Lafortune Sep 07 '11 at 18:12
  • 19
    Also referred to as "fence-posting." The fifth fence-post (id 4) exists, but the fifth element does not. Slicing is a fence-post operation, indexing is an element operation. – Matty K Jun 21 '12 at 02:12
  • 2
    I found a more detailed explanation here (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/380636 - this discusses strings but I assume it's the same issue). Basically, range fence-posting allows text to be inserted between two characters. If ranges used character indexes, this would not be possible. And since a range can specify the space after the final character, you can append to a string like (for a 4-character string): str[4,0] = "appendme". It's not what I'm used to, but now that I know the reason I like it. – Stephen Feb 12 '13 at 08:13
  • 3
    Several months later while learning Python, I came across a helpful chart that illustrates this concept. It's for Python but you might find it helpful to make a similar chart for yourself in Ruby, as the concept is the same. Search the page for "Python indexes and slices:" http://wiki.python.org/moin/MovingToPythonFromOtherLanguages – Stephen Jul 29 '13 at 03:04
  • 1
    @Stephen: how does Python array slicing relate to the question? I find that it is quite different from Ruby slicing. The specific issue here, namely fence-posting, has absolutely no relevance in Python. – Manur Mar 11 '14 at 14:40
  • @Manur not true, array slices in Python are also based on fence-posting, not indexes, as you can see by following the link I gave. It's referred to there as "the spaces between the elements." Not every detail is the same, but as I said, some people may find it helpful to adapt that chart for Ruby for their reference. – Stephen Mar 11 '14 at 22:08
  • @Stephen Ok, I take back that it has 'absolutely no relevance', but Python doesn't exhibit the (at first) weird sometimes-empty-array/sometimes-nil behavior of Ruby which is the point of the question. – Manur Mar 12 '14 at 10:27
  • 2
    @Manur as I said, I simply thought people might find it helpful to make a similar chart to what I linked for their reference. The overall concept of fence-posting is the "why" behind the answer to this question, and I think the chart format may be helpful for grasping it. It's a minor contribution, and not worth the space we've spent debating it. – Stephen Mar 12 '14 at 19:13
  • Note that when you talk about `slice` in your answer, you're only talking about the 2 parameters variant, with `start` and `length`. `slice(index)` also exists, and doesn't fit your description. – Eric Duminil Jan 06 '17 at 19:45
  • @EricDuminil: You misunderstand: slicing in my answer refers to the operation that `slice(start, length)` does; indexing is the operation that `slice(index)` does. When I say "slicing", I am referring to the abstract operation, not of applying the Ruby `slice` method. – Amadan Jan 06 '17 at 21:20
  • Thanks for the answer. I'm referring to `The first argument in slice`, which could be ambiguous since it clearly is about Ruby `slice` method. – Eric Duminil Jan 06 '17 at 21:24
  • @EricDuminil: You said I am only talking of the two-parameter variant; and I am saying you are mistaken. `slice` and `[]` are fully synonymous in Ruby; I am talking about the single-argument version as "indexing" and two-argument version as "slicing". There is nothing left out in my description. – Amadan Jan 06 '17 at 21:48
  • I understand what you want to say in the comment and answer. I'm just saying that it's not clear from your answer that `slice` with one parameter is "indexing" and that `slice` with two parameters is "slicing". You also don't mention that `slice` and `[]` are fully synonymous. Just to be clear : I'm just trying to make it easier for a Ruby newcomer to understand your answer. – Eric Duminil Jan 06 '17 at 21:54
28

this has to do with the fact that slice returns an array, relevant source documentation from Array#slice:

 *  call-seq:
 *     array[index]                -> obj      or nil
 *     array[start, length]        -> an_array or nil
 *     array[range]                -> an_array or nil
 *     array.slice(index)          -> obj      or nil
 *     array.slice(start, length)  -> an_array or nil
 *     array.slice(range)          -> an_array or nil

which suggests to me that if you give the start that is out of bounds, it will return nil, thus in your example array[4,0] asks for the 4th element that exists, but asks to return an array of zero elements. While array[5,0] asks for an index out of bounds so it returns nil. This perhaps makes more sense if you remember that the slice method is returning a new array, not altering the original data structure.

EDIT:

After reviewing the comments I decided to edit this answer. Slice calls the following code snippet when the arg value is two:

if (argc == 2) {
    if (SYMBOL_P(argv[0])) {
        rb_raise(rb_eTypeError, "Symbol as array index");
    }
    beg = NUM2LONG(argv[0]);
    len = NUM2LONG(argv[1]);
    if (beg < 0) {
        beg += RARRAY(ary)->len;
    }
    return rb_ary_subseq(ary, beg, len);
}

if you look in the array.c class where the rb_ary_subseq method is defined, you see that it is returning nil if the length is out of bounds, not the index:

if (beg > RARRAY_LEN(ary)) return Qnil;

In this case this is what is happening when 4 is passed in, it checks that there are 4 elements and thus does not trigger the nil return. It then goes on and returns an empty array if the second arg is set to zero. while if 5 is passed in, there are not 5 elements in the array, so it returns nil before the zero arg is evaluated. code here at line 944.

I believe this to be a bug, or at least unpredictable and not the 'Principle of Least Surprise'. When I get a few minutes I will a least submit a failing test patch to ruby core.

Jed Schneider
  • 14,085
  • 4
  • 35
  • 46
  • 2
    But... the element indicated by the 4 in array[4,0] doesn't exist either... - because it is actually the 5the element (0-based counting, see the examples). So it is out of bounds as well. – Pascal Van Hecke Aug 25 '10 at 20:56
  • 1
    you're right. I went back and looked at the source, and it looks like the first argument is handled inside the c code as the length, not the index. I will edit my answer, to reflect this. I think this could be submitted as a bug. – Jed Schneider Aug 26 '10 at 13:39
23

At least note that the behavior is consistent. From 5 on up everything acts the same; the weirdness only occurs at [4,N].

Maybe this pattern helps, or maybe I'm just tired and it doesn't help at all.

array[0,4] => [:peanut, :butter, :and, :jelly]
array[1,3] => [:butter, :and, :jelly]
array[2,2] => [:and, :jelly]
array[3,1] => [:jelly]
array[4,0] => []

At [4,0], we catch the end of the array. I'd actually find it rather odd, as far as beauty in patterns go, if the last one returned nil. Because of a context like this, 4 is an acceptable option for the first parameter so that the empty array can be returned. Once we hit 5 and up, though, the method likely exits immediately by nature of being totally and completely out of bounds.

Matchu
  • 83,922
  • 18
  • 153
  • 160
12

This makes sense when you consider than an array slice can be a valid lvalue, not just an rvalue:

array = [:peanut, :butter, :and, :jelly]
# replace 0 elements starting at index 5 (insert at end or array):
array[4,0] = [:sandwich]
# replace 0 elements starting at index 0 (insert at head of array):
array[0,0] = [:make, :me, :a]
# array is [:make, :me, :a, :peanut, :butter, :and, :jelly, :sandwich]

# this is just like replacing existing elements:
array[3, 4] = [:grilled, :cheese]
# array is [:make, :me, :a, :grilled, :cheese, :sandwich]

This wouldn't be possible if array[4,0] returned nil instead of []. However, array[5,0] returns nil because it's out of bounds (inserting after the 4th element of a 4-element array is meaningful, but inserting after the 5th element of a 4 element array is not).

Read the slice syntax array[x,y] as "starting after x elements in array, select up to y elements". This is only meaningful if array has at least x elements.

Frank Szczerba
  • 5,000
  • 3
  • 31
  • 31
10

I found explanation by Gary Wright very helpful as well. http://www.ruby-forum.com/topic/1393096#990065

The answer by Gary Wright is -

http://www.ruby-doc.org/core/classes/Array.html

The docs certainly could be more clear but the actual behavior is self-consistent and useful. Note: I'm assuming 1.9.X version of String.

It helps to consider the numbering in the following way:

  -4  -3  -2  -1    <-- numbering for single argument indexing
   0   1   2   3
 +---+---+---+---+
 | a | b | c | d |
 +---+---+---+---+
 0   1   2   3   4  <-- numbering for two argument indexing or start of range
-4  -3  -2  -1

The common (and understandable) mistake is too assume that the semantics of the single argument index are the same as the semantics of the first argument in the two argument scenario (or range). They are not the same thing in practice and the documentation doesn't reflect this. The error though is definitely in the documentation and not in the implementation:

single argument: the index represents a single character position within the string. The result is either the single character string found at the index or nil because there is no character at the given index.

  s = ""
  s[0]    # nil because no character at that position

  s = "abcd"
  s[0]    # "a"
  s[-4]   # "a"
  s[-5]   # nil, no characters before the first one

two integer arguments: the arguments identify a portion of the string to extract or to replace. In particular, zero-width portions of the string can also be identified so that text can be inserted before or after existing characters including at the front or end of the string. In this case, the first argument does not identify a character position but instead identifies the space between characters as shown in the diagram above. The second argument is the length, which can be 0.

s = "abcd"   # each example below assumes s is reset to "abcd"

To insert text before 'a':   s[0,0] = "X"           #  "Xabcd"
To insert text after 'd':    s[4,0] = "Z"           #  "abcdZ"
To replace first two characters: s[0,2] = "AB"      #  "ABcd"
To replace last two characters:  s[-2,2] = "CD"     #  "abCD"
To replace middle two characters: s[1..3] = "XX"    #  "aXXd"

The behavior of a range is pretty interesting. The starting point is the same as the first argument when two arguments are provided (as described above) but the end point of the range can be the 'character position' as with single indexing or the "edge position" as with two integer arguments. The difference is determined by whether the double-dot range or triple-dot range is used:

s = "abcd"
s[1..1]           # "b"
s[1..1] = "X"     # "aXcd"

s[1...1]          # ""
s[1...1] = "X"    # "aXbcd", the range specifies a zero-width portion of
the string

s[1..3]           # "bcd"
s[1..3] = "X"     # "aX",  positions 1, 2, and 3 are replaced.

s[1...3]          # "bc"
s[1...3] = "X"    # "aXd", positions 1, 2, but not quite 3 are replaced.

If you go back through these examples and insist and using the single index semantics for the double or range indexing examples you'll just get confused. You've got to use the alternate numbering I show in the ascii diagram to model the actual behavior.

stack1
  • 1,004
  • 2
  • 13
  • 28
vim
  • 135
  • 1
  • 8
  • 3
    Can you include the main idea of that thread? (in case of the link one day becomes invalid) – VonC Sep 25 '12 at 12:54
10

This does make sense

You need to be able to assign to those slices, so they are defined in such a way that the beginning and the end of the string have working zero-length expressions.

array[4, 0] = :sandwich
array[0, 0] = :crunchy
=> [:crunchy, :peanut, :butter, :and, :jelly, :sandwich]
DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
  • 1
    You can also assign to the range that slice that returns as nil, so it would be useful to expand this explanation. `array[5,0]=:foo # array is now [:peanut, :butter, :and, :jelly, nil, :foo]` – mfazekas Jun 19 '14 at 01:34
  • what does the second number do when assigning? it seems to be ignored. ```[26] pry(main)> array[4,5] = [:love, :hope, :peace] => [:peanut, :butter, :and, :jelly, :love, :hope, :peace] ``` – Drew Verlee Jun 19 '15 at 00:08
  • @drewverlee it __isn’t__ ignored: `array = [:a, :b, :c, :d, :e]; array[1,2] = :x, :x; array => [:a, :x, :x, :d, :e]` – fanaugen Jul 07 '15 at 12:05
8

I agree that this seems like strange behavior, but even the official documentation on Array#slice demonstrates the same behavior as in your example, in the "special cases" below:

   a = [ "a", "b", "c", "d", "e" ]
   a[2] +  a[0] + a[1]    #=> "cab"
   a[6]                   #=> nil
   a[1, 2]                #=> [ "b", "c" ]
   a[1..3]                #=> [ "b", "c", "d" ]
   a[4..7]                #=> [ "e" ]
   a[6..10]               #=> nil
   a[-3, 3]               #=> [ "c", "d", "e" ]
   # special cases
   a[5]                   #=> nil
   a[5, 1]                #=> []
   a[5..10]               #=> []

Unfortunately, even their description of Array#slice doesn't seem to offer any insight as to why it works this way:

Element Reference—Returns the element at index, or returns a subarray starting at start and continuing for length elements, or returns a subarray specified by range. Negative indices count backward from the end of the array (-1 is the last element). Returns nil if the index (or starting index) are out of range.

Mark Rushakoff
  • 249,864
  • 45
  • 407
  • 398
7

An explanation provided by Jim Weirich

One way to think about it is that index position 4 is at the very edge of the array. When asking for a slice, you return as much of the array that is left. So consider the array[2,10], array[3,10] and array[4,10] ... each returns the remaining bits of the end of the array: 2 elements, 1 element and 0 elements respectively. However, position 5 is clearly outside the array and not at the edge, so array[5,10] returns nil.

suvankar
  • 1,548
  • 1
  • 20
  • 28
6

Consider the following array:

>> array=["a","b","c"]
=> ["a", "b", "c"]

You can insert an item to the begining (head) of the array by assigning it to a[0,0]. To put the element between "a" and "b", use a[1,0]. Basically, in the notation a[i,n], i represents an index and n a number of elements. When n=0, it defines a position between the elements of the array.

Now if you think about the end of the array, how can you append an item to its end using the notation described above? Simple, assign the value to a[3,0]. This is the tail of the array.

So, if you try to access the element at a[3,0], you will get []. In this case you are still in the range of the array. But if you try to access a[4,0], you'll get nil as return value, since you're not within the range of the array anymore.

Read more about it at http://mybrainstormings.wordpress.com/2012/09/10/arrays-in-ruby/ .

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Tairone
  • 61
  • 1
  • 1
2

tl;dr: in the source code in array.c, different functions are called depending on whether you pass 1 or 2 arguments in to Array#slice resulting in the unexpected return values.

(First off, I'd like to point out that I don't code in C, but have been using Ruby for years. So if you're not familiar with C, but you take a few minutes to familiarize yourself with the basics of functions and variables it's really not that hard to follow the Ruby source code, as demonstrated below. This answer is based on Ruby v2.3, but is more or less the same back to v1.9.)

Scenario #1

array.length == 4; array.slice(4) #=> nil

If you look at the source code for Array#slice (rb_ary_aref), you see that when only one argument is passed in (lines 1277-1289), rb_ary_entry is called, passing in the index value (which can be positive or negative).

rb_ary_entry then calculates the position of the requested element from the beginning of the array (in other words, if a negative index is passed in, it computes the positive equivalent) and then calls rb_ary_elt to get the requested element.

As expected, rb_ary_elt returns nil when the length of the array len is less than or equal to the index (here called offset).

1189:  if (offset < 0 || len <= offset) {
1190:    return Qnil;
1191:  } 

Scenario #2

array.length == 4; array.slice(4, 0) #=> []

However when 2 arguments are passed in (i.e. the starting index beg, and length of the slice len), rb_ary_subseq is called.

In rb_ary_subseq, if the starting index beg is greater than the array length alen, nil is returned:

1208:  long alen = RARRAY_LEN(ary);
1209:
1210:  if (beg > alen) return Qnil;

Otherwise the length of the resulting slice len is calculated, and if it's determined to be zero, an empty array is returned:

1213:  if (alen < len || alen < beg + len) {
1214:  len = alen - beg;
1215:  }
1216:  klass = rb_obj_class(ary);
1217:  if (len == 0) return ary_new(klass, 0);

So since the starting index of 4 is not greater than array.length, an empty array is returned instead of the nil value that one might expect.

Question answered?

If the actual question here isn't "What code causes this to happen?", but rather, "Why did Matz do it this way?", well you'll just have to buy him a cup of coffee at the next RubyConf and ask him.

Scott Schupbach
  • 1,284
  • 9
  • 21