About tables in general
(oh, can't you just give me an array)
In Lua, a table is the single general-purpose data structure. Table keys can be of any type, like number
, string
, boolean
. Only nil
keys aren't allowed.
Whether tables can or can't contain nil
values is a surprisingly difficult question which I tried to answer in depth here. Let's just assume that setting t[k] = nil
should be the observably the same as never setting k
at all.
Table construction syntax (like t2 = {10, 20, nil, 40}
) is a syntactic sugar for creating a table and then setting its values one by one (in this case: t2 = {}
, t2[1] = 10
, t2[2] = 20
, t2[3] = nil
, t2[4] = 40
).
Tables as arrays
(oh, from this angle it really looks quite arrayish)
As tables are the only complex data structure in Lua, the language (for convenience) provides some ways for manipulating tables as if they were arrays.
Notably, this includes the length operator (#t
) and many standard functions, like table.insert
, table.remove
, and more.
The behavior of the length operator (and, in consequence, the mentioned utility functions) is only defined for array-like tables with a particular set of keys, so-called sequences.
Quoting the Lua 5.2 Reference manual:
the length of a table t is only defined if the table is a sequence, that is, the set of its positive numeric keys is equal to {1..n} for some integer n
As a result, the behavior of calling #t
on a table not being a sequence at that time, is undefined.
It means that any result could be expected, including 0
, -1
, or false
, or an error being raised (unrealistic for the sake of backwards compatibility), or even Lua crashing (quite unrealistic).
Indirectly, this means that the behavior of utility functions that expect a sequence is undefined if called with a non-sequence.
Sequences and non-sequences
(it's really not obvious)
So far, we know that using the length operator on tables not being sequences is a bad idea. That means that we should either do that in programs that are written in a particular way, that guarantees that those tables will always be sequences in practice, or, in case we are provided with a table without any assumptions about their content, we should dynamically ensure they are indeed a sequence.
Let's practice. Remember: positive numeric keys have to be in the form {1..n}, e.g. {1}, {1, 2, 3}, {1, 2, 3, 4, 5}, etc.
t = {}
t[1] = 123
t[2] = "bar"
t[3] = 456
Sequence. Easy.
t = {}
t[1] = 123
t[2] = "bar"
t[3] = 456
t[5] = false
Not a sequence. {1, 2, 3, 5} is missing 4.
t = {}
t[1] = 123
t[2] = "bar"
t[3] = 456
t[4] = nil
t[5] = false
Not a sequence. nil
values aren't considered part of the table, so again we're missing 4.
t = {}
t[1] = 123
t[2] = "bar"
t[3.14] = 456
t[4] = nil
t[5] = false
Not a sequence. 3.14
is positive, but isn't an integer.
t = {}
t[0] = "foo"
t[1] = 123
t[2] = "bar"
Sequence. 0
isn't counted for the length and utility functions will ignore it, but this is a valid sequence. The definition only gives requirements about positive number keys.
t = {}
t[-1] = "foo"
t[1] = 123
t[2] = "bar"
Sequence. Similar.
t = {}
t[1] = 123
t["bar"] = "foo"
t[2] = "bar"
t[false] = 1
t[3] = 0
Sequence. We don't care about non-numeric keys.
Diving into the implementation
(if you really have to know)
But what happens in C implementation of Lua when we call #
on a non-sequence?
Background: Tables in Lua are internally divided into array part and hash part. That's an optimization. Lua tries to avoid allocating memory often, so it pre allocates for the next power of two. That's another optimization.
- When the last item in the array part is
nil
, the result of #
is the length of the shortest valid sequence found by binsearching the array part for the first nil-followed key.
- When the last item in the array part is not
nil
AND the hash part is empty, the result of #
is the physical length of the array part.
- When the last item in the array part is not
nil
AND the hash part is NOT empty, the result of #
is the length of the shortest valid sequence found by binsearching the hash part for for the first nil-followed key (that is such positive integer i
that t[i] ~= nil
and t[i+1] == nil
), assuming that the array part is full of non-nils(!).
So the result of #
is almost always the (desired) length of the shortest valid sequence, unless the last element in the array part representing a non-sequence is non-nil. Then, the result is bigger than desired.
Why is that? It seems like yet another optimization (for power-of-two sized arrays). The complexity of #
on such tables is O(1)
, while other variants are O(log(n))
.