-1

Context I want to implement a language on top of python where I need to add meta-data to existing python objects, but I don't want to modify the underlying representation of all python objects. I can compute and store the id of EVERY object, then use as key for a traditional Python dict. but this is an incredible waste, since those IDs are very quickly derivable from the objects themselves. Other programming languages (Lisp, Java, etc.) have an "identity hash" which hashes based on the identity of the key object with ever calling its hash function.

Can I do this in Python without without writing my own hash from scratch?

Dan Oblinger
  • 489
  • 3
  • 15
  • 2
    if you want to use the `id` as keys, whats wrong with just using the `id` as keys and letting the dictionary work internally like it always does? – Paritosh Singh Apr 03 '19 at 18:10
  • 2
    if you use the `id(item)` as key ... and want to get the item from the dict ... and don't have it ...how are you going to get the id of it? and if you already have the `item` ... why get its `id(item)` to retrieve it from a dict............................ you already _have it_ - no retrieval needed .... – Patrick Artner Apr 03 '19 at 18:15
  • You want a true Map with arbitrary keys instead of just string keys? – Jared Smith Apr 03 '19 at 18:15
  • Are you storing these id's somewhere? I ask because I think @Patrick has a very valid point. – martineau Apr 03 '19 at 18:21
  • Also note that your claim that "all objects have a unique id" isn't strictly true. For example two strings with identical content may have the same id because the value's been "interned" (see [`sys.intern()`](https://docs.python.org/3/library/sys.html#sys.intern)). – martineau Apr 03 '19 at 18:29
  • I rewrote the question. @PatrickArtner I did not say I wanted to store the object as the VALUE of the dict. (this would effectively be a set), only that I want identity hashing such that the returned value is derived from id equivalence. – Dan Oblinger Apr 03 '19 at 23:12
  • @Dan: And I asked how you want to retrieve the object from your dict. If you use `id(instance)` as key - you need to somehow get the id() - for that you need the instance. If you store the thing 1st time: no problem. id()s are "random" and "not replicable" over different start sessions of your program, there is no way for you to say _this id is the key for this object_ (like you could for f.e. a Person Object if you combine "Name,Firstname,Bloodtype,Birthdate,Gender (add more to make it more unique)". So you stuff your thing in, then want to retrieve it ... by what? – Patrick Artner Apr 04 '19 at 05:52
  • If you have a small set of tags to be hashed: you are optimizing wrongly. Just take a normal dict, use integers (1...n, or an EnumInt) as key and do not concern yourself with how the dict works. If you can _measure_ a bottleneck in your system, and the _only thing_ to be optimized is the way how a dict stores / hashes its values - I would predict your program is very optimal to begin with or you overlooked things to optimize. Now - if this is just a pure hypothetical question of "is it is possible to do so?" ... then: Close vote as Too Broad. Try to code it, come back with specific question. – Patrick Artner Apr 04 '19 at 05:59
  • And thirdly: if you have your own _arbritrary python objects_ and want to "optimize" its hashfunction, see [how-to-implement-a-good-hash-function-in-python](https://stackoverflow.com/questions/4005318/how-to-implement-a-good-hash-function-in-python) - creating a good hash is difficult though, I wouldn't bother to do so if I only wanted to store strings. See [what-is-the-xy-problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) and [when-is-optimisation-premature](https://stackoverflow.com/questions/385506/when-is-optimisation-premature). – Patrick Artner Apr 04 '19 at 06:03
  • Chances are that your "tags" are already interned ( https://stackoverflow.com/questions/15541404/python-string-interning and https://docs.python.org/3/library/sys.html?highlight=sys#sys.intern ) but that is a implementation detail you have not much influence over - if they are, they are stored only once and all others are references to that one anyway ==> see premature optimization. – Patrick Artner Apr 04 '19 at 06:07
  • @PatrickArtner I want to implement a language on top of python where I need to add meta-data to existing python objects, but I don't want to modify the underlying representation of all python objects. I agree I can compute and store the id of EVERY object, then use as key. but this is an incredible waste, since those IDs are very derivable from the objects themselves -- in other languages identity hash does not req redundant storage of this id. My question is if I can do this in python (without writing my own hash from scratch)? – Dan Oblinger Apr 04 '19 at 10:25
  • @PatrickArtner I agree. confusion could have been avoided if I had added context. So I completely rewrote the question. Thanks. (it may be that this is simply impossible to do in Python... if I so I would love to know that it is impossible.) – Dan Oblinger Apr 04 '19 at 19:33

1 Answers1

2

@Paritosh said it plainly: just add use the id as a key:

mydict[id(myobj)] = myvalue

or the hash of the id:

mydict[hash(id(myobj))] = myvalue #Note you could just hash(myobj) for many objects.

You could wrap this up in some class, but that seems like over kill to me, though it depends on your exact use case. This is not a true hash map though - so collisions may happen and override existing objects. You can add logic to check if the key exists, and then create a list there:

newhash = hash(id(myobj))
if newhash in mydict: #O(1)
    try: mydict[newhash].append(myvalue)
    except AttributeError: mydict[newhash] = [mydict[newhash],myvalue]
else: mydict[newhash] = myvalue

or just use a defaultdict. This may be a better candidate for a new class.

@Patrick raises the concern that what you want to do may make no sense - you would have no idea how to reference the object, without already knowing what id the object is associated with. If you keep some list of ids you added it may make some sense, again depending on use case.

kabanus
  • 24,623
  • 6
  • 41
  • 74
  • how do you retrieve that from the dict without having the object at hand? – Patrick Artner Apr 03 '19 at 18:17
  • 1
    @PatrickArtner What do you mean? like you retrieve anything from a dictionary- with the key. – kabanus Apr 03 '19 at 18:17
  • But if you do not have the object, how do you get its id()? – Patrick Artner Apr 03 '19 at 18:18
  • 1
    How do you retrieve a value from any dictionary without knowing what the key is? How is that a relevant concern? – Jared Smith Apr 03 '19 at 18:19
  • 1
    @PatrickArtner I tried to address your concern in the edit. Let me know if you agree. – kabanus Apr 03 '19 at 18:20
  • but I *DO* have the object. it is stored inside some other python structure. this is a common situation where one wants to associate data with dynamic objects w/o changing their signature. In my case I am storing permission data associated with dynamically created python objects. – Dan Oblinger Apr 03 '19 at 23:16
  • @DanOblinger then it does make sense in your case. That was what mystified Patrick. You should have no problem using the `id` as the key. – kabanus Apr 04 '19 at 05:07
  • @kabanus. Great, if you edit your answer to be " mydict(id(mykey)] = myvalue " I will accept it as the answer. but my real question is still unanswered: Can you accomplish this WITHOUT redundantly having a copy of every objects id in heap memory? do you know the answer? – Dan Oblinger Apr 06 '19 at 10:54
  • @DanOblinger Done. Regarding the rest of your comment, I am not sure what you mean. This is Python, meaning you do not deal with memory at all. Once nothing points to the object, eventually it will be "cleaned" - but then you might duplicate an ID, since a new object can take that ID. You have to allocate room for it somewhere at least once for it to get an ID, but that would be true for any hashing mechanism in any language. Perhaps ask a new question with your exact use case (what are the objects, what should be saved etc...) – kabanus Apr 06 '19 at 18:11
  • @kabanus If you use a traditional dict as described here you will have the actual id stored in memory as a long int for each object that you associate a value with. This is needless redundancy since the id is easily computed from the object itself. A true identity dict would only have a hash entry nothing else. I think this is probably not supported in Python. Thanks – Dan Oblinger Apr 08 '19 at 05:52
  • @DanOblinger Added the way to obtain the `hash` of the `id`. Of course, this no longer guarantees no collisions, as this is not a true hash map. – kabanus Apr 08 '19 at 06:03
  • @DanOblinger See edit - that may be worth wrapping in a class. – kabanus Apr 08 '19 at 06:07