40

Can you dereference a variable id retrieved from the id function in Python? For example:

dereference(id(a)) == a

I want to know from an academic standpoint; I understand that there are more practical methods.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
Hophat Abc
  • 5,203
  • 3
  • 18
  • 18

4 Answers4

52

Here's a utility function based on a (now-deleted) comment made by "Tiran" in a weblog discussion @Hophat Abc references in his own answer that will work in both Python 2 and 3.

Disclaimer: If you read the the linked discussion, you'll find that some folks think this is so unsafe that it should never be used (as likewise mentioned in some of the comments below). I don't agree with that assessment but feel I should at least mention that there's some debate about using it.

import _ctypes

def di(obj_id):
    """ Inverse of id() function. """
    return _ctypes.PyObj_FromPtr(obj_id)

if __name__ == '__main__':
    a = 42
    b = 'answer'
    print(di(id(a)))  # -> 42
    print(di(id(b)))  # -> answer
martineau
  • 119,623
  • 25
  • 170
  • 301
  • 6
    But be warned that this will segfault the interpreter if you pass it an invalid address (say, because it's been garbage collected). – asmeurer Sep 22 '17 at 21:42
  • 9
    Segfault, or worse. For example, there might be some other object living there now, and your program will silently chug along with the wrong object and make wrong decisions as a result, deleting your files or leaking your users' data or revving up the bandsaw your program was supposed to control or worse. – user2357112 Oct 11 '19 at 09:53
  • FWIW, I've actually used something like this to work around some limitations of the [JSON module](https://stackoverflow.com/questions/13249415/how-to-implement-custom-indentation-when-pretty-printing-with-the-json-module). I think there are scenarios where it's "safe enough" to consider using. – martineau Oct 11 '19 at 10:07
  • 3
    @martineau: I wouldn't consider that safe, and certainly not safe enough to post on Stack Overflow for people to copy without even realizing it's doing something unsafe. It would have been much better to traverse the input to find all the NoIndent instances instead of trying to go from ID to object by direct memory access. The use of PyObj_FromPtr is a crash risk and security issue if there's any possibility a string that looks like `@@123@@` could be in the output without coming from a `NoIndent`, and even if that can't happen to you, it could happen to anyone who uses your answer. – user2357112 Oct 11 '19 at 10:25
  • Or, since you have access to the `NoIndent` object in your `default` method, you could have saved a reference to it there (and then cleared the saved references in `encode` or something to avoid keeping objects alive). – user2357112 Oct 11 '19 at 10:33
  • @user2357112: Guess we'll just have to agree to disagree. The pattern used can be easily changed as needed and the whole saving references for later look up has its own potentially serious issues wrt keeping objects alive. – martineau Oct 12 '19 at 19:13
  • I was a little disturbed by the use of the undocumented `_ctypes` module here (which I'd never heard of), and couldn't find much on Google about why it exists, so have asked [a question about it](https://stackoverflow.com/q/58475560/1709587). Perhaps you have wisdom to share about it; if so, I invite you to take a look. – Mark Amery Oct 20 '19 at 17:40
  • 2
    @MarkAmery: You can also do it with `ctypes` if that's what is bothering your (see [answer](https://stackoverflow.com/questions/1396668/get-object-by-id/15702647#15702647)). – martineau Sep 23 '21 at 17:30
  • 1
    There's a race condition here. There's no guarantee that the object with `obj_id` as its identifier still exists by the time you call `PyObj_FromPtr`. I don't know what `PyObj_FromPtr` does if you give it an invalid id, but it can also return a *different* object than the one you expect, if a new object has recycled the id. – chepner Nov 18 '21 at 19:52
  • Does this work on multiprocessing? – N3RDIUM Jan 09 '22 at 15:38
  • 1
    @somePythonProgrammer: No, processes run in separate memory-spaces so passing the id of an object in one of them to another isn't feasible. It would work OK completely *within* one of them though. – martineau Jan 09 '22 at 16:31
  • 1
    `PyObj_FromPtr` is part of the private API. This can be done while using the official API with `ctypes.cast(some_id, ctypes.py_object).value` instead. – mara004 Aug 21 '22 at 10:23
9

Not easily.

You could recurse through the gc.get_objects() list, testing each and every object if it has the same id() but that's not very practical.

The id() function is not intended to be dereferenceable; the fact that it is based on the memory address is a CPython implementation detail, that other Python implementations do not follow.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
8

There are several ways and it's not difficult to do:

In O(n)

In [1]: def deref(id_):
   ....:     f = {id(x):x for x in gc.get_objects()}
   ....:     return f[id_]

In [2]: foo = [1,2,3]

In [3]: bar = id(foo)

In [4]: deref(bar)
Out[4]: [1, 2, 3]

A faster way on average, from the comments (thanks @Martijn Pieters):

def deref_fast(id_):
    return next(ob for ob in gc.get_objects() if id(ob) == id_)

The fastest solution is in the answer from @martineau, but does require exposing python internals. The solutions above use standard python constructs.

martineau
  • 119,623
  • 25
  • 170
  • 301
munk
  • 12,340
  • 8
  • 51
  • 71
  • 5
    You now created a reference to *every value in memory* ensuring that they'll not get garbage collected. You really want to use the [`weakref` module](http://docs.python.org/2/library/weakref.html#weakref.WeakValueDictionary) instead. – Martijn Pieters Feb 21 '13 at 20:52
  • 2
    The OP asked from an academic standpoint, not a practical one. – munk Feb 21 '13 at 20:53
  • 1
    But the OP won't be the only one looking at this; the idea of creating a mapping of id-to-object is nice, but it has certain consequences that you need to be aware of. – Martijn Pieters Feb 21 '13 at 20:56
  • @MartijnPieters: True, but only until the all go away when the function returns and the local variable `f` no longer exists, right? – martineau Feb 21 '13 at 20:57
  • @martineau: good point; indeed, `f` only lives for the duration of the function call. At which point creating the dictionary is a waste of cycles because you may as well just scan through the list for an average cost of n/2 instead of a fixed cost of n (n being the number of objects in memory). – Martijn Pieters Feb 21 '13 at 20:58
  • @MartijnPieters: Perhaps, but it still might be more efficient as well as easier to develop and maintain than not being able to do it all. – martineau Feb 21 '13 at 21:09
  • @martineau: If the `dict()` was a `weakref.WeakValueDictionary()` and not just a local variable, then you can reuse it and then it's more efficient. By building it *each* and *every* time you call `deref()` you force it to loop through all objects. Doing `return next(ob for ob in gc.get_objects() if id(ob) == id_)` would only have to go through half the objects, on average. :-) – Martijn Pieters Feb 21 '13 at 21:12
  • 1
    Not that you *can* reuse such a beast anyway, because next time you call, that same `id` could have been re-used already. – Martijn Pieters Feb 21 '13 at 21:13
  • 1
    @MartijnPieters: Yeah, reuse would be problematic -- for a moment I thought you were saying that the idea might be salvagable if it was a `WeakValueDictionary()` with a longer lifespan. Regardless, this answer as presented with a regular `dict`, _would_ work, albeit somewhat slowly. – martineau Feb 21 '13 at 21:22
5

Note: Updated to Python 3.

Here's yet another answer adapted from a yet another comment, this one by "Peter Fein", in the discussion @Hophat Abc referenced in his own answer to his own question.

Though not a general answer, but might still be useful in cases where you know something about the class of the objects whose ids you want to lookup — as opposed to them being the ids of anything. I felt this might be a worthwhile technique even with that limitation (and sans the safety issues my other answer has). The basic idea is to make a class which keeps track of instances and subclasses of itself.

#!/usr/bin/env python3
import weakref

class MetaInstanceTracker(type):
    """ Metaclass for InstanceTracker. """

    def __new__(cls, name, bases, dic):
        cls = super().__new__(cls, name, bases, dic)
        cls.__instances__ = weakref.WeakValueDictionary()
        return cls

class InstanceTracker(object, metaclass=MetaInstanceTracker):
    """ Base class that tracks instances of its subclasses using weakreferences. """

    def __init__(self, *args, **kwargs):
        self.__instances__[id(self)]=self
        super().__init__(*args, **kwargs)

    @classmethod
    def find_instance(cls, obj_id):
        return cls.__instances__.get(obj_id, None)


if __name__ == '__main__':

    class MyClass(InstanceTracker):
        def __init__(self, name):
            super(MyClass, self).__init__()
            self.name = name
        def __repr__(self):
            return '{}({!r})'.format(self.__class__.__name__, self.name)

    obj1 = MyClass('Bob')
    obj2 = MyClass('Sue')

    print(MyClass.find_instance(id(obj1)))  # -> MyClass('Bob')
    print(MyClass.find_instance(id(obj2)))  # -> MyClass('Sue')
    print(MyClass.find_instance(42))        # -> None

martineau
  • 119,623
  • 25
  • 170
  • 301
  • I agree. Use [weakref](https://docs.python.org/3/library/weakref.html) and set up a tracker. This is the right answer in my view. – root-11 Mar 11 '22 at 23:38