1

I have two list of dictionaries representing the rows of two tables, so:

tableA = [{"id": 1, "name": "foo"}, {"id": 2, "name": "bar"}]

tableB = [{"id": 1, "name": "bar"}, {"id": 3, "name": "baz"}]

I want to obtain the difference in the following way:

added = [{"id": 3, "name": "baz"}]

updated = [{"id": 1, "name": "bar"}]

I know, that id is unique.

So, I am planning to loop over tableB, ask for the id, if they are equal then compare the dicts to know if it is updated, if not the id is new and it is added.

for x in tableA:
    idexists = false
    for y in tableY:
        if x["id"] == y["id"]:
           if x != y:
              updated.append(x)
           idexists = true
    if not idexists:
        added.append(x)

Is it correct? Can it be done in pythonic way?

FacundoGFlores
  • 7,858
  • 12
  • 64
  • 94

1 Answers1

2

I would restructure the tables into a more conveninent form of id: name dictionaries and then diff:

from deepdiff import DeepDiff


tableA = [{"id": 1, "name": "foo"}, {"id": 2, "name": "bar"}]

tableB = [{"id": 1, "name": "bar"}, {"id": 3, "name": "baz"}]

tableA = {item["id"]: item["name"] for item in tableA}
tableB = {item["id"]: item["name"] for item in tableB}

print(DeepDiff(tableA, tableB))

Prints:

{
  'dictionary_item_added': {'root[3]'}, 
  'dictionary_item_removed': {'root[2]'}, 
  'values_changed': {
    'root[1]': { 
      'old_value': 'foo', 
      'new_value': 'bar'
    }
  }
}

For calculating the differences used deepDiff module suggested here, but you can use any of the suggested solutions from the linked thread.


Another idea could be to transform the list of dictionaries into a pandas.DataFrame and then diff:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Great! But the thing is, there woule be more columns than just `name`, would it be efficient? because the tables are large. (Now I will take a look to pandas) – FacundoGFlores Sep 19 '16 at 20:19
  • @FacundoGFlores please measure to find if this would be more efficient in your case on a large table. The big advantage of re-structuring is that we have is the constant-time lookups by `id`. Note that we lose order in this case (assuming it is not significant).. – alecxe Sep 19 '16 at 20:22