8

(Also posted on the Lua mailing list)

So I've been writing deep-copy algorithms, and I wanna test them to see if they work the way I want them to. While I do have access to the original->copy map, I want a general-purpose deep-compare algorithm that must be able to compare table keys (tables as keys?).

My deep-copy algorithm(s) are avaliable here: https://gist.github.com/SoniEx2/fc5d3614614e4e3fe131 (it's not very organized, but there are 3 of them, one uses recursive calls, the other uses a todo table, and the other simulates a call stack (in a very ugly but 5.1-compatible way))

Recursive version:

local function deep(inp,copies)
  if type(inp) ~= "table" then
    return inp
  end
  local out = {}
  copies = (type(copies) == "table") and copies or {}
  copies[inp] = out -- use normal assignment so we use copies' metatable (if any)
  for key,value in next,inp do -- skip metatables by using next directly
    -- we want a copy of the key and the value
    -- if one is not available on the copies table, we have to make one
    -- we can't do normal assignment here because metatabled copies tables might set metatables

    -- out[copies[key] or deep(key,copies)]=copies[value] or deep(value,copies)
    rawset(out,copies[key] or deep(key,copies),copies[value] or deep(value,copies))
  end
  return out
end

Edit: I found things like this which don't really handle tables as keys: http://snippets.luacode.org/snippets/Deep_Comparison_of_Two_Values_3 (Copy of snippet below)

function deepcompare(t1,t2,ignore_mt)
  local ty1 = type(t1)
  local ty2 = type(t2)
  if ty1 ~= ty2 then return false end
  -- non-table types can be directly compared
  if ty1 ~= 'table' and ty2 ~= 'table' then return t1 == t2 end
  -- as well as tables which have the metamethod __eq
  local mt = getmetatable(t1)
  if not ignore_mt and mt and mt.__eq then return t1 == t2 end
  for k1,v1 in pairs(t1) do
    local v2 = t2[k1]
    if v2 == nil or not deepcompare(v1,v2) then return false end
  end
  for k2,v2 in pairs(t2) do
    local v1 = t1[k2]
    if v1 == nil or not deepcompare(v1,v2) then return false end
  end
  return true
end

Serializing is also not an option, as order of serialization is "random".

Machavity
  • 30,841
  • 27
  • 92
  • 100
SoniEx2
  • 1,864
  • 3
  • 27
  • 40
  • What is the question here exactly? Do those work? Do they fail on certain cases? I'm also not sure I fully understand what you want exactly. You want to be able to compare two arbitrary tables to one another? The difficulty there is going to be in insuring that you exhaust the keys in the tables on both sides I would think. – Etan Reisner Sep 18 '14 at 23:03
  • @EtanReisner I don't have an easy way to test if they work, other than prettyprinting the input and output and then manually comparing them. If I had an automated way to test it, (aka, deep-compare it) I wouldn't have to spend my time manually checking if it works correctly... It would also be useful for a testsuite... – SoniEx2 Sep 19 '14 at 00:08
  • Are you trying to compare tables as keys by value or by contents? What about functions (as either keys or values)? What about C functions (same)? What about userdata (same)? – Etan Reisner Sep 19 '14 at 00:29
  • @EtanReisner My functions only copy tables (so far), so everything else you can just == or rawequals(). My main problem is things like `{[{}] = {}}`. – SoniEx2 Sep 19 '14 at 02:04
  • Right. You need to define what "equality" means for that scenario before you can solve the problem. Are all empty tables "equal"? That might be reasonable except it doesn't help you when you need to to lookups in the copy table from an item in the original table. – Etan Reisner Sep 19 '14 at 02:19
  • @EtanReisner Yes it should consider {} == {}. (It should also handle recursion but I expected that to be implied on my original message) – SoniEx2 Sep 19 '14 at 12:03
  • How do you propose to copy and then test for equality a table like this? `{[{}] = {"foo", bar"}, [{}] = {"baz", "quux"}}` Just at a procedural level. – Etan Reisner Sep 19 '14 at 13:34
  • @EtanReisner To copy you already have the algorithm (actually you have multiple algorithms), to compare, it would compare the keys and values. At first it might see that the key for `o1[{}]={"foo","bar"}` is the same as `o2[{}]={"baz","quux"}`, but then it should compare the value, and it'll see that `{"foo","bar"}` isn't the same as `{"baz","quux"}`. If you have multiple `[{}]={}` entries, it should compare and track them until it finds that both tables have an equal number of `[{}]={}` entries, then return `true` (or `false` if they don't). – SoniEx2 Sep 19 '14 at 14:57
  • That's profoundly expensive in the general case, you realize that, right? It can certainly be done but requires keeping an enormous amount of state. Because you can't ever clean that up since a table can appear at more than one key location. – Etan Reisner Sep 19 '14 at 15:02
  • Check solution here: https://stackoverflow.com/a/32660766/2336707 – Zachary Feb 15 '23 at 12:24

2 Answers2

5

As others said, that depends a lot on your definition of equivalence. If you want this to be true:

local t1 = {[{}] = {1}, [{}] = {2}}
local t2 = {[{}] = {1}, [{}] = {2}}
assert( table_eq(t1, t2) )

If you do, then each time the key in t1 is a table, you'll have to check its equivalence with every table key in t2 and try them one by one. This is a way to do it (metatable stuff left out for readability).

function table_eq(table1, table2)
   local avoid_loops = {}
   local function recurse(t1, t2)
      -- compare value types
      if type(t1) ~= type(t2) then return false end
      -- Base case: compare simple values
      if type(t1) ~= "table" then return t1 == t2 end
      -- Now, on to tables.
      -- First, let's avoid looping forever.
      if avoid_loops[t1] then return avoid_loops[t1] == t2 end
      avoid_loops[t1] = t2
      -- Copy keys from t2
      local t2keys = {}
      local t2tablekeys = {}
      for k, _ in pairs(t2) do
         if type(k) == "table" then table.insert(t2tablekeys, k) end
         t2keys[k] = true
      end
      -- Let's iterate keys from t1
      for k1, v1 in pairs(t1) do
         local v2 = t2[k1]
         if type(k1) == "table" then
            -- if key is a table, we need to find an equivalent one.
            local ok = false
            for i, tk in ipairs(t2tablekeys) do
               if table_eq(k1, tk) and recurse(v1, t2[tk]) then
                  table.remove(t2tablekeys, i)
                  t2keys[tk] = nil
                  ok = true
                  break
               end
            end
            if not ok then return false end
         else
            -- t1 has a key which t2 doesn't have, fail.
            if v2 == nil then return false end
            t2keys[k1] = nil
            if not recurse(v1, v2) then return false end
         end
      end
      -- if t2 has a key which t1 doesn't have, fail.
      if next(t2keys) then return false end
      return true
   end
   return recurse(table1, table2)
end

assert( table_eq({}, {}) )
assert( table_eq({1,2,3}, {1,2,3}) )
assert( table_eq({1,2,3, foo = "fighters"}, {["foo"] = "fighters", 1,2,3}) )
assert( table_eq({{{}}}, {{{}}}) )
assert( table_eq({[{}] = {1}, [{}] = {2}}, {[{}] = {1}, [{}] = {2}}) )
assert( table_eq({a = 1, [{}] = {}}, {[{}] = {}, a = 1}) )
assert( table_eq({a = 1, [{}] = {1}, [{}] = {2}}, {[{}] = {2}, a = 1, [{}] = {1}}) )

assert( not table_eq({1,2,3,4}, {1,2,3}) )
assert( not table_eq({1,2,3, foo = "fighters"}, {["foo"] = "bar", 1,2,3}) )
assert( not table_eq({{{}}}, {{{{}}}}) )
assert( not table_eq({[{}] = {1}, [{}] = {2}}, {[{}] = {1}, [{}] = {2}, [{}] = {3}}) )
assert( not table_eq({[{}] = {1}, [{}] = {2}}, {[{}] = {1}, [{}] = {3}}) )
Hisham H M
  • 6,398
  • 1
  • 29
  • 30
  • Shouldn't `recurse` be declared as a `local function`? – SoniEx2 Sep 24 '14 at 11:44
  • 2
    As wonderful as this function is, for me it fails intermittently on this assertion: `assert( table_eq({a = 1, [{}] = {1}, [{}] = {2}}, {[{}] = {2}, a = 1, [{}] = {1}}) )` provided above. I ended up using code found in LuaUnit to do my comparisons – David Corbin Jul 27 '20 at 10:35
1

Instead of direct comparison you may try to serialize each of the tables and compare the serialized results. There are several serializers that handle table as keys and can serialize deep and recursive structures. For example, you may try Serpent (available as a standalone module and also included in Mobdebug):

local dump = pcall(require, 'serpent') and require('serpent').dump
  or pcall(require, 'mobdebug') and require('mobdebug').dump
  or error("can't find serpent or mobdebug")
local a = dump({a = 1, [{}] = {}})
local b = dump({[{}] = {}, a = 1})
print(a==b)

This returns true for me (both Lua 5.1 and Lua 5.2). One of the caveats is that the serialization result depends on the order in which keys are serialized. The tables as key by default are sorted based on their tostring value, which means that the order depends on their allocation address and it's not difficult to come up with an example that fails under Lua 5.2:

local dump = pcall(require, 'serpent') and require('serpent').dump
  or pcall(require, 'mobdebug') and require('mobdebug').dump
  or error("can't find serpent or mobdebug")
local a = dump({a = 1, [{}] = {1}, [{}] = {2}})
local b = dump({[{}] = {2}, a = 1, [{}] = {1}})
print(a==b) --<-- `false` under Lua 5.2

One way to protect against this is to consistently represent tables in keys comparison; for example, instead of default tostring, you may want to serialize tables and their values and sort the keys based on that (serpent allows a custom handler for sortkeys), which would make the sorting more robust and the serialized results more stable.

Paul Kulchenko
  • 25,884
  • 3
  • 38
  • 56