Checking if Two Massive Python Dictionaries are Equivalent

Question

I have a massive python dictionary with over 90,000 entries. For reasons I won't get into, I need to store this dictionary in my database and then at a later point recompile dictionary from the database entries.

I am trying to set up a procedure to verify that my storage and recompilation was faithful and that my new dictionary is equivalent to the old one. What is the best methodology for testing this.

There are minor differences and I want to figure out what they are.

If your values all have equivalence defined, just dict1 == dict2 should work — Thomas, Sep 30 '11 at 14:33
I am assuming that there could be some minor problems, and if there are minor problems, I want to know what they are i.e. what the differences are. — Spencer, Sep 30 '11 at 14:34
Do you need to do a straight-up `==` check, or would you be interested in knowing which elements are different (e.g., for debugging)? — , Sep 30 '11 at 14:38
Here's a very nice class that does exactly what you want - http://stackoverflow.com/questions/1165352/fast-comparison-between-two-python-dictionary/1165552#1165552 — ronakg, Sep 30 '11 at 14:46
I mean that I am using urrlib to encode my data and if the original format wasn't properly url encoded it would lead to minor discrepancy. — Spencer, Sep 30 '11 at 15:46

score 13 · Accepted Answer · edited Aug 18 '17 at 22:51

13

The most obvious approach is of course:

if oldDict != newDict:
  print "**Failure to rebuild, new dictionary is different from the old"

That ought to be the fastest possible, since it relies on Python's internals to do the comparison.

UPDATE: It seems you're not after "equal", but something weaker. I think you need to edit your question to make it clear what you consider "equivalent" to mean.

edited Aug 18 '17 at 22:51

KalEl

8,978
13
47
56

answered Sep 30 '11 at 14:34

unwind

391,730
64
469
606

I have tried this and there are differences. I want to set-up a procedure that lets me know what those differences are. – Spencer Sep 30 '11 at 14:37
8

@Peter if you want to "set-up a procedure that lets me know what those differences are" which I think was clear in your question, why would you mark this answer as accepted? – agf Oct 03 '11 at 22:19
and what if you have nested objects, not primitives – dtc Sep 07 '17 at 18:45

score 2 · Answer 2 · edited May 23 '17 at 12:32

2

>>> d1 = {'a':1,'b':2,'c':3}
>>> d2 = {'b':2,'x':2,'a':5}
>>> set(d1.iteritems()) - set(d2.iteritems()) # items in d1 not in d2
set([('a', 1), ('c', 3)])
>>> set(d2.iteritems()) - set(d1.iteritems()) # items in d2 not in d1
set([('x', 2), ('a', 5)])

Edit Don't vote for this answer. Go to Fast comparison between two Python dictionary and add an upvote. It is a very complete solution.

edited May 23 '17 at 12:32

Community

1
1

answered Sep 30 '11 at 14:49

Steven Rumbalski

44,786
9
89
119

Other post doesn't use `iteritems`. I like this approach better. – sholsapp Mar 09 '12 at 18:23

score 2 · Answer 3 · answered Sep 30 '11 at 14:50

You could start with something like this and tweak it to suit your needs

>>> bigd = dict([(x, random.randint(0, 1024)) for x in xrange(90000)])
>>> bigd2 = dict([(x, random.randint(0, 1024)) for x in xrange(90000)])
>>> dif = set(bigd.items()) - set(bigd2.items())

Checking if Two Massive Python Dictionaries are Equivalent

3 Answers3

Linked