0

What is a good way of hashing a hierarchy (similar to a file structure) in python?

I could convert the whole hierarchy into a dotted string and then hash that, but is there a better (or more efficient) way of doing this without going back and forth all the time?

An example of a structure I might want to hash is:

a -> b1 -> c -> 1 -> d
a -> b2 -> c -> 2 -> d
a -> c -> 1 -> d
Dan
  • 33,953
  • 24
  • 61
  • 87
  • By hierarchy, do you mean a single file's list of path components, i.e. ["usr", "local", "test", "myfile"]? – DNS Mar 17 '09 at 13:15
  • Downmodding, question is very unclear and confusing. – ddaa Mar 17 '09 at 13:44
  • added an example to make it a little clearer... – Dan Mar 17 '09 at 13:59
  • What does the "->" symbol mean in the example? What actual Python data structure contains the source data that creates your hierarchy? – S.Lott Mar 17 '09 at 14:52

3 Answers3

8

If you have access to your hierarchy components as a tuple, just hash it - tuples are hashable. You may not gain a lot over conversion to and from a delimited string, but it's a start.

If this doesn't help, perhaps you could provide more information about how you store the hierarchy/path information.

Blair Conrad
  • 233,004
  • 25
  • 132
  • 111
  • +1. python isn't javascript, dictionary keys can be more than just strings. unfortunately, it's not Lua either, where keys can be _any_ value – Javier Mar 17 '09 at 13:50
4

How do you want to access your hierarchy?

If you're always going to be checking for a full path, then as suggested, use a tuple: eg:

>>> d["a","b1","c",1,"d"] = value

However, if you're going to be doing things like "quickly find all the items below "a -> b1", it may make more sense to store it as a nested hashtable (otherwise you must iterate through all items to find those you're intereted in).

For this, a defaultdict is probably the easiest way to store. For example:

from collections import defaultdict

def new_dict(): return defaultdict(new_dict)
d = defaultdict(new_dict)

d["a"]["b1"]["c"][1]["d"] = "test"
d["a"]["b2"]["c"][2]["d"] = "test2"
d["a"]["c"][1]["d"] = "test3"

print d["a"]["c"][1]["d"]  # Prints test3
print d["a"].keys()        # Prints ["c", "b1", "b2"]
Brian
  • 116,865
  • 28
  • 107
  • 112
1

You can make any object hashable by implementing the __hash__() method

So you can simply add a suitable __hash__() method to the objects storing your hierarchy, e.g. compute the hash recursively, etc.

Ber
  • 40,356
  • 16
  • 72
  • 88