Let's unpack what's happening here.
First example: After the operations, folder1 = {1, 2, 3, "test", 5}
.
This is a proper sequence, all entries from key 1
to 5
are non-nil, so the length is 5
, just as expected.
Second example: After the operations, folder2 = {1, 2, 3, "test", nil, "test2"}
. This is not a proper "sequence", it has holes. Your explanation is
The third case is understandable since #table returns the last index before a nil index.
but this definition is wrong; #table
returns any index i
such that t[i] ~= nil
and t[i+1] == nil
. It is not guaranteed to be the last or the first index, and in practice it often won't be for performance reasons 1.
In the second example, both 4
and 6
are valid values for #folder2
. You are getting 6
, so everything is fine here.
Third example: After the operations, folder3 = {1, 2, 3, "test", nil, nil, "test2"}
. Both 4
and 7
are valid values for #folder3
. The larger gap leads PUC Lua to evaluate #folder3
to 7
, which has to do with the implementation details. 1
Your second set of examples can be explained analogeously:
#{1, 2, 3, "test2"}
is 4
, as expected
#{1, 2, 3, nil, "test2"}
may be either 3
or 5
; you get 3
, which is fine
#{1, 2, 3, nil, nil, "test2"}
may be either 3
or 6
; you get ``, which is fine
1 Lua tables are implemented using an array and a hash part. The "array" part entries won't always land in the array part, but Lua wants {[3] = 3, [2] = 2, [1] = 1}
and {1, 2, 3}
to be the same from the programmer's point of view. Thus Lua needs to define #t
for hash tables; this means it has to find the length somehow, it can't just rely on an internal length field. Even for arrays this won't work, since the programmer may set entries to nil
in the middle of the array, and later on, at the end of the array.
If Lua had to return the first or last boundary index, it would either have to do additional bookkeeping with significant overhead, or it would have to use a linear search. Both aren't viable options.
What Lua does instead is quite clever: It uses a binary search to find any boundary: If t[i]
is nil
, you know that there will be a boundary j < i
such that t[j + 1]
is a nil
. If t[i]
is not nil
, you know that there will be a boundary j >= i
such that t[j + 1]
is nil
- you can halve the search space for the boundary in each step. To find an initial upper bound on i
, you can just check powers of two i = 2^n
until you hit a nil
. Furthermore, you can use the max safe integer as an upper bound.
Here is a reimplementation of the table length operator in pure Lua I did a while ago, with some tests included to give you an idea:
local rawget = rawget
local maxint = math.maxinteger or 2^53
function len(tbl)
if tbl[1] == nil then
return 0
end
if tbl[maxint] then
return maxint
end
-- Quickly ramp numbers up. Could start with high = maxint.
local low, high = 1, 1
while rawget(tbl, high) do
low, high = high, high * 2
if low > high then -- overflow
high = maxint
break
end
end
while low < high do
local mid = math.ceil((low + high) / 2)
if tbl[mid] == nil then
high = mid - 1
else
low = mid
end
end
return low
end
assert(len{} == 0)
local t = {}; for i = 1, 1e6 do t[i] = i end; assert(len(t) == 1e6)
print(len{1, 2, 3, 4, 5, 6})
print(len{1, 2, 3, nil, 5, 6})
t = {}
for i = 0, 500 do
t[2^i] = i
end
print(len(t))
Lua can get tighter upper bounds using less time since it knows the capacity of the hash and array part and can use that as an upper bound; it won't have to do the "try powers of two" trick.