3

After reading this topic and after experimenting a bit, I am trying to understand how the Lua length operator works when a table contains nil values.

Before I started to investigate, I thought that the length was simply the number of consecutive non-nil elements, starting at index 1:

print(#{nil})         -- 0
print(#{"o"})         -- 1
print(#{"o",nil})     -- 1
print(#{"o","o"})     -- 2
print(#{"o","o",nil}) -- 2

That looks pretty simple, right?

But my headache started when I accidentally added an element after a nil-terminated table:

print(#{"o",nil,"o"})

My guess was that it should probably print 1 because it would stop counting when the first nil is found. Or maybe it should print 2 if the length operator is greedy enough to look for non-nil elements after the first nil. But the above code prints 3.

So I’ve ran several other tests to see what happens:

-- nil before the end
print(#{nil,"o"})     -- 2
print(#{nil,"o","o"}) -- 3
print(#{"o",nil,"o"}) -- 3

-- several nil elements
print(#{"o",nil,nil}) -- 1
print(#{nil,"o",nil}) -- 0
print(#{nil,nil,"o"}) -- 3

I should mention that repl.it currently uses Lua 5.1.5 which is rather old, but if you test with the Lua demo, which currently uses Lua 5.3.5, you’ll get the same results.

By looking at those results and by looking at this answer, I assume that:

  • if the last element is not nil, the length operator returns the full size of the table, including nil entries if any
  • if the last element is nil, it counts the number of consecutive non-nil and stops counting at the first nil

Are those assumptions correct?

Can we predict a 100% well-defined behavior when a table contains one or several nil values?

The Lua documentation states that the length of a table is only defined if the table is a sequence. Does that mean that the length operator has undefined behavior for non-sequences?

Apart from the length operator, can nil values cause any trouble in a table?

vdavid
  • 2,434
  • 1
  • 14
  • 15
  • https://stackoverflow.com/questions/23590885/why-does-luas-length-operator-return-unexpected-values – Andrew Kashpur Aug 06 '19 at 14:05
  • 1
    Possible duplicate of [Why does Lua's length (#) operator return unexpected values?](https://stackoverflow.com/questions/23590885/why-does-luas-length-operator-return-unexpected-values) – ad absurdum Aug 06 '19 at 14:10
  • "Can we predict a 100% well-defined behavior": Yes, intrinsic `#` will return an integer >= 0 on every table. – Tom Blodget Aug 06 '19 at 20:51
  • 2
    @TomBlodget `#setmetatable({}, {__len=function() return "popsickles" end})` – dualed Aug 11 '19 at 12:59
  • @dualed Yes, I was allowing for that with "intrinsic". I can only hope that no one thinks it's a good idea to create such a metamethod in practice. – Tom Blodget Aug 11 '19 at 14:56
  • @TomBlodget I thought so, but as there is no `"raw#"` function, the answer to the 100% question should still be no. (also the intrinsic behaviour of `#` includes metamethod handling, unless my information on intricacies of the intriguing word intrinsic is incomplete) – dualed Aug 11 '19 at 15:25
  • You're right. The guarantee would only apply if you protect your table from code that would add a __len metamethod. – Tom Blodget Aug 11 '19 at 15:38

3 Answers3

4

We can predict some behaviour, but it is not standardised, and as such you should never rely on it. It's quite possible that the behaviour may change within this major version of Lua.

Should you ever need to fill a table with nil values, I suggest wrapping the table and replace holes with a unique placeholder value (eg. NIL={}; if v==nil then t[k]=NIL end, this is quite cheap to test against and safe.).

That said...

As there is even a difference in the result of # depending on how the table is defined, you'll have to distinguish between statically defined (constant) tables and dynamic defined (muted) tables.

Static table definitions:

#{nil,nil,nil,nil,nil,  1} -- 6
#{3, 2, nil, 1} -- 4

#{nil,nil,nil,  1,  1,nil} -- 0
#{nil,nil,  1,  1,  1,nil} -- 5
#{nil,  1,  1,  1,  1,nil} -- 5
#{nil,nil,nil,nil,  1,nil} -- 0
#{nil,nil,  1,nil,  1,nil,nil} -- 5
#{nil,nil,nil,  1,nil,nil,  1,nil} -- 4

Using this kind of definition, as long as the last value is non-nil, you will get a length equal to the position of the last value. If the last value is nil, Lua starts a (non-linear) search from the tail until it finds the first non-nil value.

Dynamic data definition

local x={}; x[5]=1;print(#x) -- 0
local x={}; x[1]=1;x[2]=1;x[3]=1;x[5]=1;print(#x) -- 3
local x={}; x[1]=1;x[2]=1;x[4]=1;x[5]=1;print(#x) -- 5

#{[5]=1} -- 0
local x={nil,nil,nil,1};x[5]=1;print(#x) -- 0

As soon as the table was changed once, the operator works the other way (that includes static definitions with []). If the first element is nil, # always returns 0, but if not it starts a search that I did not investigate further (I guess you can check the sources, though I don't think it's a standard binary search), until it finds a nil value that is preceded by a non-nil value.

As said before, relying on this behaviour is not a good idea, and invites lots of issues down the road. Though if you want to make a nasty unmaintainable program to mess with a colleague, that's a sure way to do it.

dualed
  • 10,262
  • 1
  • 26
  • 29
  • Thank you for the detailed answer with concrete examples. I did not think of dynamic tables and it perfectly shows that it would be foolish to rely on the length operator in these cases. – vdavid Aug 13 '19 at 08:58
3

When a table is a sequence (all numeric keys start at 1 and there are no nil gaps), # is defined to be precisely the count of those elements.

For non-sequence tables, it is a bit more complicated. Lua 5.2 seems to leave the result as undefined. For 5.1 and 5.3, the result of the operation is a border.

A border in a table is any positive index that contains a non-nil value followed by nil, or 0 if the first element is nil. # is defined to return any value that satifies these conditions.

Looking at it from another perspective, since tables contain an "array" part and a "map" part, Lua has no way of knowing where the "map" indices start. For example, you can create a table with 1000 values and then set the first 999 of them to nil; that could leave you with a table of "size" 1000. However, you can also start with an empty table and set the 1000th element, having a table of "size" 0 but still structurally equivalent to the first one. The result of # is then simply the first valid value the internal algorithm finds.

IS4
  • 11,945
  • 2
  • 47
  • 86
2

The length operator produces undefined behaviour for tables that aren't sequences (i.e. tables with nil elements in the middle of the array). This means that even if the Lua implementation always behaves in a certain way, you shouldn't rely on that behaviour, as it may change in future versions of Lua, or in different implementations like LuaJIT.

You can use nils in tables - there is nothing wrong with that - just don't use the length operator on a table which might have nils before non-nil values.

The post you linked to contains more details about how the actual algorithm works. It mentions counting elements with a "binsearch", i.e. a binary search. This is not the same as just counting the elements one by one - if there are nils in the table, then depending on their exact position, the binary search algorithm may treat them as the end of the table, or may just ignore them.

To sum up, the algorithm is harder to predict than you were assuming, and even though it is technically possible to predict what will happen in any given case, you shouldn't rely on that behaviour as it is liable to change.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
Jack Taylor
  • 5,588
  • 19
  • 35