8

I have a massive python dictionary with over 90,000 entries. For reasons I won't get into, I need to store this dictionary in my database and then at a later point recompile dictionary from the database entries.

I am trying to set up a procedure to verify that my storage and recompilation was faithful and that my new dictionary is equivalent to the old one. What is the best methodology for testing this.

There are minor differences and I want to figure out what they are.

Spencer
  • 21,348
  • 34
  • 85
  • 121
  • 1
    If your values all have equivalence defined, just dict1 == dict2 should work – Thomas Sep 30 '11 at 14:33
  • I am assuming that there could be some minor problems, and if there are minor problems, I want to know what they are i.e. what the differences are. – Spencer Sep 30 '11 at 14:34
  • Do you need to do a straight-up `==` check, or would you be interested in knowing which elements are different (e.g., for debugging)? –  Sep 30 '11 at 14:38
  • 3
    Here's a very nice class that does exactly what you want - http://stackoverflow.com/questions/1165352/fast-comparison-between-two-python-dictionary/1165552#1165552 – ronakg Sep 30 '11 at 14:46
  • "minor problems"? What do you mean by "minor problems"? – S.Lott Sep 30 '11 at 15:36
  • I mean that I am using urrlib to encode my data and if the original format wasn't properly url encoded it would lead to minor discrepancy. – Spencer Sep 30 '11 at 15:46

3 Answers3

13

The most obvious approach is of course:

if oldDict != newDict:
  print "**Failure to rebuild, new dictionary is different from the old"

That ought to be the fastest possible, since it relies on Python's internals to do the comparison.

UPDATE: It seems you're not after "equal", but something weaker. I think you need to edit your question to make it clear what you consider "equivalent" to mean.

KalEl
  • 8,978
  • 13
  • 47
  • 56
unwind
  • 391,730
  • 64
  • 469
  • 606
  • I have tried this and there are differences. I want to set-up a procedure that lets me know what those differences are. – Spencer Sep 30 '11 at 14:37
  • 8
    @Peter if you want to "set-up a procedure that lets me know what those differences are" which I think was clear in your question, why would you mark this answer as accepted? – agf Oct 03 '11 at 22:19
  • and what if you have nested objects, not primitives – dtc Sep 07 '17 at 18:45
2
>>> d1 = {'a':1,'b':2,'c':3}
>>> d2 = {'b':2,'x':2,'a':5}
>>> set(d1.iteritems()) - set(d2.iteritems()) # items in d1 not in d2
set([('a', 1), ('c', 3)])
>>> set(d2.iteritems()) - set(d1.iteritems()) # items in d2 not in d1
set([('x', 2), ('a', 5)])

Edit Don't vote for this answer. Go to Fast comparison between two Python dictionary and add an upvote. It is a very complete solution.

Community
  • 1
  • 1
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
2

You could start with something like this and tweak it to suit your needs

>>> bigd = dict([(x, random.randint(0, 1024)) for x in xrange(90000)])
>>> bigd2 = dict([(x, random.randint(0, 1024)) for x in xrange(90000)])
>>> dif = set(bigd.items()) - set(bigd2.items())
Facundo Casco
  • 10,065
  • 8
  • 42
  • 63