21

I'm developing a simple optimized JSON function. Lua uses tables to represent arrays but in JSON I need to recognize between them. The code below is used:

t={
    a="hi",
    b=100
}

function table2json(t,formatted)
if type(t)~="table" then return nil,"Parameter is not a table. It is: "..type(t)    end

local ret=""--return value
local lvl=0 --indentation level
local INDENT="  " --OPTION: the characters put in front of every line for indentation
function addToRet(str) if formatted then ret=ret..string.rep(INDENT,lvl)..str.."\n" else ret=ret..str end end

addToRet("{")
lvl=1
for k,v in pairs(t) do
    local typeof=type(v)
    if typeof=="string" then
        addToRet(k..":\""..v.."\"")
    elseif typeof=="number" then
        addToRet(k..":"..v)
    end
end
lvl=0
addToRet("}")

return ret
end

print(table2json(t,true))

As you can see in JSON reference an object is what is called a table in Lua and it's different from an array.

The question is how I can detect if a table is being used as an array?

  • One solution of course is to go through all pairs and see if they only have numerical consecutive keys but that's not fast enough.
  • Another solution is to put a flag in the table that says it is an array not an object.

Any simpler/smarter solution?

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
AlexStack
  • 16,766
  • 21
  • 72
  • 104
  • See: http://stackoverflow.com/questions/6077006/how-can-i-check-if-a-lua-table-contains-only-sequential-numeric-indices/6080274#6080274 – BMitch Sep 23 '11 at 14:19

10 Answers10

17

If you want fast, simple, non-intrusive solution that will work most of the times, then I'd say just check index 1 - if it exists, the table is an array. Sure, there's no guarantee, but in my experience, tables rarely have both numerical and other keys. Whether it's acceptable for you to mistake some objects for arrays and whether you expect this to happen often depend on your usage scenario - I guess it's not good for general JSON library.

Edit: For science, I went to see how Lua CJSON does things. It goes through all pairs and checks if all keys are integers while keeping the maximum key (the relevant function is lua_array_length). Then it decides whether to serialize the table as an array or object depending on how sparse the table is (the ratio is user controlled) i.e. a table with indices 1,2,5,10 will probably be serialized as an array while a table with indices 1,2,1000000 will go as an object. I guess this is actually quite good solution.

sbk
  • 9,212
  • 4
  • 32
  • 40
7

The simplest algorithm to differentiate between arrays/non-arrays is this one:

local function is_array(t)
  local i = 0
  for _ in pairs(t) do
      i = i + 1
      if t[i] == nil then return false end
  end
  return true
end

Explanation here: https://web.archive.org/web/20140227143701/http://ericjmritz.name/2014/02/26/lua-is_array/

That said, you will still have issues with empty tables - are they "arrays" or "hashes"?

For the particular case of serializing json, what I do is marking arrays with a field in their metatable.

-- use this when deserializing
local function mark_as_array(t)
  setmetatable(t, {__isarray = true})
end

-- use this when serializing
local function is_array(t)
  local mt = getmetatable(t)
  return mt.__isarray
end
L. Miller
  • 347
  • 1
  • 2
  • 6
kikito
  • 51,734
  • 32
  • 149
  • 189
6

Here is more simplest check based on Lua specific #len function mechanism.

function is_array(table)
  if type(table) ~= 'table' then
    return false
  end

  -- objects always return empty size
  if #table > 0 then
    return true
  end

  -- only object can have empty length with elements inside
  for k, v in pairs(table) do
    return false
  end

  -- if no elements it can be array and not at same time
  return true
end

local a = {} -- true
local b = { 1, 2, 3 } -- true
local c = { a = 1, b = 1, c = 1 } -- false
feod0r
  • 69
  • 1
  • 1
  • 2
    Useful. You could shorten this a bit: `function is_array(tbl) return type(tbl) == 'table' and (#tbl > 0 or next(tbl) == nil) end`. – tarleb Oct 08 '18 at 14:24
  • 1
    Just a heads up that is solution is non-exhaustive and fails for any table that contains an array component in addition to a non-array component e.g. `{[1] = 1, a = 1}`, incorrectly indicating that the following is an array. It'd be quicker and just as accurate (i.e. not very) to simply use `function is_array(tbl) return type(tbl) == 'table' and tbl[1] ~= nil end` – Benjamin Dobell Jan 11 '20 at 14:25
  • Note that the function in BenjaminDobell's comment returns false for empty tables, unlike this answer. – MarredCheese Feb 02 '22 at 17:27
4

@AlexStack

if not t[i] and type(t[i])~="nil" then return false end

This code is wrong, if fails when one of the elemets is false.

> return  isArray({"one", "two"})
true
> return  isArray({false, true})
false

I think the whole expression can be changed to type(t[i]) == nil but it still will fail in some scenarios because it will not support nil values.

A good way to go, I think, is trying with ipairs or checking whether #t is equal to count, but #t returns 0 with objects and count will be zero with empty arrays, so it may need an extra check at the beginning of the function, something like: if not next(t) then return true.

As a sidenote, I'm pasting another implementation, found in lua-cjson (by Mark Pulford):

-- Determine with a Lua table can be treated as an array.
-- Explicitly returns "not an array" for very sparse arrays.
-- Returns:
-- -1   Not an array
-- 0    Empty table
-- >0   Highest index in the array
local function is_array(table)
    local max = 0
    local count = 0
    for k, v in pairs(table) do
        if type(k) == "number" then
            if k > max then max = k end
            count = count + 1
        else
            return -1
        end
    end
    if max > count * 2 then
        return -1
    end

    return max
end 
ichramm
  • 6,437
  • 19
  • 30
4

No there is no built-in way to differentiate, because in Lua there is no difference.

There are already existing JSON libraries, which probably already do this (eg. Lua CJSON.

Other options are

  • leave it up to the user to specify what type the argument is, or as what type he'd want to have it treated.
  • have arrays explicitly declared by arranging the __newindex such that only new numerical and subsequent indices are allowed to be used.
jpjacobs
  • 9,359
  • 36
  • 45
3

You can simply test this (assuming t is a table):

function isarray(t)
  return #t > 0 and next(t, #t) == nil
end

print(isarray{}) --> false
print(isarray{1, 2, 3}) --> true
print(isarray{a = 1, b = 2, c = 3}) --> false
print(isarray{1, 2, 3, a = 1, b = 2, c = 3}) --> false
print(isarray{1, 2, 3, nil, 5}) --> true

It tests if there is any value in the "array part" of the table, then checks if there is any value after that part, by using next with the last consecutive numeric index.

Note that Lua does some logic to decide when to use this "array part" and the "hash part" of the table. That's why in the last example the provided table is detected as an array: it is dense enough to be considered an array despite the nil in the middle, or in other words, it is not sparse enough. Just as another answer here mentions, this is very useful in the context of data serialization, and you don't have to program it for yourself, you can use Lua underlying logic. If you would serialize that last example, you could use for i = 1, #t do ... end instead of using ipairs.

From my observation in Lua and LuaJIT implementation, the function next always looks up the array part of the table first, so any non array index will be found after the whole array part, even though after that it doesn't follow any particular order. I'm not sure if this is a consistent behaviour across different Lua versions, though.

Also, it's up to you to decide if empty tables should be treated as arrays as well. In this implementation, they are not treated as arrays. You could change it to return next(t) == nil or (#t > 0 and next(t, #t) == nil) to do the opposite.

Anyway, I guess this is the shortest you can get in terms of code lines and complexity, since it is lower bounded by next (which I believe is either O(1) or O(logn)).

PiFace
  • 526
  • 3
  • 19
  • 1
    In my testing this all works except the last one. It returns true when it shouldn’t. – Le Mot Juiced Mar 23 '21 at 15:50
  • `function isarray(tableT) for k, v in pairs(tableT) do if tonumber(k) ~= nil and k ~= #tableT then if tableT[k+1] ~= k+1 then return false end end end return #tableT > 0 and next(tableT, #tableT) == nil end` seems to work – Le Mot Juiced Mar 23 '21 at 16:19
  • hmm you're right. I tested this only in LuaJIT. In the last case, LuaJIT says that `#({1, 2, 3, nil, 5})` is `3`, not `5`. That is not the case in standard Lua. – PiFace Mar 23 '21 at 17:14
  • but this function you provided seems to have something wrong, too. `tableT[k+1] ~= k+1` will be true almost always – PiFace Mar 23 '21 at 17:17
  • It will not be true in the last case you show. That table turns into {1=1, 2=2, 3=3, 5=5}. – Le Mot Juiced Mar 23 '21 at 17:30
  • well, yeah, but it doesn't work as a general case, say, if it is an array of strings – PiFace Mar 23 '21 at 17:33
  • Well exactly, that’s why there are two checks, yours and the addition I made. Did you test it? It passes my tests. Yours worked in all situations except the last one, so I had to write something that would catch the last one. – Le Mot Juiced Mar 23 '21 at 17:34
  • Your code is beautiful and short and mine is ungainly in comparison, I’m sure you could do it in fewer steps, but that’s what I got. – Le Mot Juiced Mar 23 '21 at 17:37
  • Check arrays `{'a', 'b', 'c'}` or `{3,1,4}`, they're identified as `false`, even though they are arrays, that's what I was saying :P – PiFace Mar 23 '21 at 17:39
  • I’m not sure where the miscommunication is here. You’re commenting on my extra code as if I proposed it to replace yours, and that’s not the case. My code is a `for` loop that runs before yours *in the same function*. It’s only designed to test for one edge case, so you pointing out that it doesn’t catch other cases is pointing out that it’s working. – Le Mot Juiced Mar 23 '21 at 21:48
  • 1
    Oh, sorry. Yeah, it was a misunderstanding, then. – PiFace Mar 23 '21 at 22:46
  • It’s okay, I’m sure you could come up with a cleaner way to do the same thing. I feel like the full solution should be in an answer somewhere, not just the comments, so if you want to modify your post, that’s great, and if you don’t, I’ll make a separate post. – Le Mot Juiced Mar 23 '21 at 23:13
2

Thanks. I developed the following code and it works:

---Checks if a table is used as an array. That is: the keys start with one and are sequential numbers
-- @param t table
-- @return nil,error string if t is not a table
-- @return true/false if t is an array/isn't an array
-- NOTE: it returns true for an empty table
function isArray(t)
    if type(t)~="table" then return nil,"Argument is not a table! It is: "..type(t) end
    --check if all the table keys are numerical and count their number
    local count=0
    for k,v in pairs(t) do
        if type(k)~="number" then return false else count=count+1 end
    end
    --all keys are numerical. now let's see if they are sequential and start with 1
    for i=1,count do
        --Hint: the VALUE might be "nil", in that case "not t[i]" isn't enough, that's why we check the type
        if not t[i] and type(t[i])~="nil" then return false end
    end
    return true
end
AlexStack
  • 16,766
  • 21
  • 72
  • 104
0

I wrote this function for pretty printing lua tables, and had to solve the same problem. None of the solutions here account for edge cases like some keys being numbers but others not. This tests every index to see if it's compatible with being an array.

function pp(thing)
    if type(thing) == "table" then
        local strTable = {}
        local iTable = {}
        local iterable = true
        for k, v in pairs(thing) do
            --if the key is a string, we don't need to do "[key]"
            local key = (((not (type(k) == "string")) and "["..pp(k).."]") or k)
            --this tests if the index is compatible with being an array
            if (not (type(k) == "number")) or (k > #thing) or(k < 1) or not (math.floor(k) == k) then
                iterable = false
            end
            local val = pp(v)
            if iterable then iTable[k] = val end
            table.insert(strTable, (key.."="..val))
        end
        if iterable then strTable = iTable end
        return string.format("{%s}", table.concat(strTable,","))
    elseif type(thing) == "string" then
        return '"'..thing..'"'
    else
        return tostring(thing)
    end
end
Houshalter
  • 2,508
  • 1
  • 18
  • 20
0

This is not pretty, and depending on how large and cleverly-deceptive the table is, it might be slow, but in my tests it works in each of these cases:

  • empty table

  • array of numbers

  • array with repeating numbers

  • letter keys with number values

  • mixed array/non-array

  • sparse array (gaps in index sequence)

  • table of doubles

  • table with doubles as keys

    function isarray(tableT)   
    
        --has to be a table in the first place of course
        if type(tableT) ~= "table" then return false end
    
        --not sure exactly what this does but piFace wrote it and it catches most cases all by itself
        local piFaceTest = #tableT > 0 and next(tableT, #tableT) == nil
        if piFaceTest == false then return false end
    
        --must have a value for 1 to be an array
        if tableT[1] == nil then return false end
    
         --all keys must be integers from 1 to #tableT for this to be an array
         for k, v in pairs(tableT) do
             if type(k) ~= "number" or (k > #tableT) or(k < 1) or math.floor(k) ~= k  then return false end
         end
    
         --every numerical key except the last must have a key one greater
         for k,v in ipairs(tableT) do
             if tonumber(k) ~= nil and k ~= #tableT then
                 if tableT[k+1] == nil then
                     return false
                 end
             end
         end
    
         --otherwise we probably got ourselves an array
         return true
     end
    

Much credit to PiFace and Houshalter, whose code I based most of this on.

Le Mot Juiced
  • 3,761
  • 1
  • 26
  • 46
-1

At least in luajit 2.1.0-beta3 (where I tested it), I reliably get sorted iteration with pairs() for numerical indices, therefore this should also work and could be a bit faster than https://stackoverflow.com/a/25709704/7787852

Even if the Lua reference manual says explicitly that the iteration order of pairs cannot be relied upon.

local function is_array(t)
  local prev = 0
  for k in pairs(t) do
    if k ~= prev + 1 then
      return false
    end
    prev = prev + 1
  end
  return true
end
mg979
  • 43
  • 5
  • This relies on the undefined iteration order of pairs and is thus incorrect. In particular, it fails if something is a Lua sequence but stored in the hash part. `is_array{[3] = 3, [2] = 2, [1] = 1}` returns `false` on my PUC Lua 5.4, but should return `true` since the contiguous integer keys 1-3 are used. – Luatic Feb 07 '23 at 19:35
  • Right. Even if, at least in luajit 2.1.0, I reliably get sorted iteration with pairs() for numerical indices. And your example returns true as well. – mg979 Feb 08 '23 at 12:51
  • Yes, but LuaJIT isn't Lua, just one particular Lua implementation. You're relying on undefined behavior / particular interpreter specifics here. LuaJIT might as well change this behavior any time. – Luatic Feb 08 '23 at 17:44