Given a custom, new-style python class instance, what is a good way to hash it and get a unique ID-like value from it to use for various purposes? Think md5sum or sha1sum of a given class instance.
The approach I am currently using pickles the class and runs that through hexdigest
, storing the resultant hash string into a class property (this property is never part of the pickle/unpickle procedures, fyi). Except now I've run into a case where a third-party module uses nested classes, and there is no really good way to pickle those without some hacks. I figure that I am missing out on some clever little Python trick somewhere to accomplish this.
Edit:
Example code because it seems to be a requirement around here to get any traction on a question. The below class can be initialized and the self._uniq_id
property can be properly setup.
#!/usr/bin/env python
import hashlib
# cPickle or pickle.
try:
import cPickle as pickle
except:
import pickle
# END try
# Single class, pickles fine.
class FooBar(object):
__slots__ = ("_foo", "_bar", "_uniq_id")
def __init__(self, eth=None, ts=None, pkt=None):
self._foo = "bar"
self._bar = "bar"
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def __getstate__(self):
return {'foo':self._foo, 'bar':self._bar}
def __setstate__(self, state):
self._foo = state['foo']
self._bar = state['bar']
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def _get_foo(self): return self._foo
def _get_bar(self): return self._bar
def _get_uniq_id(self): return self._uniq_id
foo = property(_get_foo)
bar = property(_get_bar)
uniq_id = property(_get_uniq_id)
# End
This next class, however, cannot be initialized because of Bar
being nested in Foo
:
#!/usr/bin/env python
import hashlib
# cPickle or pickle.
try:
import cPickle as pickle
except:
import pickle
# END try
# Nested class, can't pickle for hexdigest.
class Foo(object):
__slots__ = ("_foo", "_bar", "_uniq_id")
class Bar(object):
pass
def __init__(self, eth=None, ts=None, pkt=None):
self._foo = "bar"
self._bar = self.Bar()
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def __getstate__(self):
return {'foo':self._foo, 'bar':self._bar}
def __setstate__(self, state):
self._foo = state['foo']
self._bar = state['bar']
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def _get_foo(self): return self._foo
def _get_bar(self): return self._bar
def _get_uniq_id(self): return self._uniq_id
foo = property(_get_foo)
bar = property(_get_bar)
uniq_id = property(_get_uniq_id)
# End
The error I receive is:
Traceback (most recent call last):
File "./nest_test.py", line 70, in <module>
foobar2 = Foo()
File "./nest_test.py", line 49, in __init__
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed
(nest_test.py
) has both classes in it, hence the line number offset).
Pickling requires the __getstate__()
method I found out, so I also implemented __setstate__()
for completeness as well. But given the already existing warnings about security and pickle, there's got to be a better way to do this.
Based on what I have read so far, the error stems from Python not being able to resolve the nested classes. It tries to look up the attribute __main__.Bar
, which doesn't exist. It really needs to be able to find __main__.Foo.Bar
instead, but there is no really good way to do this. I bumped into another SO answer here that provides a "hack" to trick Python, but it came with a stern warning that such an approach is not advisable, and to either use something other than pickling or to move the nested class definition to the outside versus the inside.
However, the original question of that SO answer, I believe, was for pickling and unpickling to a file. I only need to pickle in order to use the requisite hashlib
functions, which seem to operate on a bytearray (much like I am used to in .NET), and pickling (Especially cPickle
) is fast and optimized versus writing my own bytearray routine.