0

Background

I came to know recently that this is because the garbage collection would clear the contents of the location anytime, so relying on it would be a bad idea. There could be some other reason too, but I don't know.

I also came to know we could access an object given its location using C, because the in CPython address=id of the object. (I should thank the IRC guys for this.). But I haven't tried it.

I am talking about this address (id):

address = id(object_name)

or may be this one (if that helps):

hex_address = hex(id(object))

Anyway, I still think it would have been better if they gave some method that could do that for me.

I wouldn't want to use such a method in practice, but it bothers me that we have an object and something that would give its address, but nothing that does the vice-versa.

Question

  • Why was this decision made?
  • Can we do this using crazy introspection/hack at the Python level? I have been told we can't do that at Python level, but I just wanted to be sure.
Bharadwaj Srigiriraju
  • 2,196
  • 4
  • 25
  • 45
  • 1
    Can you provide a scenario where this is needed? – mmmmmm Sep 28 '13 at 10:49
  • 2
    `id()` might give you the address in some Python interpreters, but that's an [implementation detail](http://docs.python.org/2/library/functions.html#id). – Matthias Sep 28 '13 at 11:06
  • 2
    What's an address? I've written an identity hash table for Python in C, but even I can afford to forget these implementation details 99% the time I'm working with Python. You're doing something wrong (or very advanced and hacky) if you even have the concept of pointers in your mind while writing Python. –  Sep 28 '13 at 11:28
  • @Mark: Almost every where I could be having a need for this, I have an alternative way to access the object itself. I was using Django messages and I wanted to know what an object like this would look like `{'messages': }`. A quick search and having a hex value like that indicated that this would be the location of that obect. So, I thought of accessing this object from `./manage.py shell` using that location (why else would they give it to me?), and then I found out I couldn't. – Bharadwaj Srigiriraju Sep 28 '13 at 12:34
  • @Matthias: and we're not supposed to use it anywhere? – Bharadwaj Srigiriraju Sep 28 '13 at 12:37
  • 1
    We're supposed to use `id()` for identity checks. Python doesn't want you to deal with memory addresses. – Matthias Sep 28 '13 at 12:40

2 Answers2

8

The simplest answer would be: "because it is not needed, and it is easier to maintain the code without low level access to variables".

A bit more elaborate is that everything you could do with such pointer, you can also do with basic references in python, or weakreferences (if you want to refer to some object without forbidding its garbage collection).

regarding "hacking":

  1. You can iterate through garbage collector and take out the object

    import gc
    
    def objects_by_id(id_):
        for obj in gc.get_objects():
            if id(obj) == id_:
                return obj
    
  2. You can use mxtools

    mx.Tools.makeref(id_)
    
  3. You can use ctypes

    ctypes.cast(id_, ctypes.py_object).value
    
lejlot
  • 64,777
  • 8
  • 131
  • 164
1

As I wrote elsewhere:


id is only defined as a number unique to the element among currently existing elements. Some Python implementations (in fact, all main ones but CPython) do not return the memory address.

%~> pypy
Python 2.7.3 (480845e6b1dd219d0944e30f62b01da378437c6c, Aug 08 2013, 17:02:19)
[PyPy 2.1.0 with GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``arguably, everything is a niche''
>>>> a = 1
>>>> b = 2
>>>> c = 3
>>>> id(a)
9L
>>>> id(b)
17L
>>>> id(c)
25L

So you have to guarantee that it is the memory address. Furthermore, because of this Python provides no id → object mapping, especially as the object that an id maps to can be changed if the original is deleted.

You have to ask why you're holding the id. If it's for space reasons, bear in mind that containers actually hold references to items, so [a, a, a, a, a] actually takes less space than [id(a), id(a), id(a), id(a), id(a)]; a.

You can consider also making a dict of {id: val} for all the relevant items and storing that. This will keep val alive, so you can use weakrefs to allow the vals to be garbage collected. Remember, use weakref if you want a weakref.


So basically it's because there's no reliable solution that's platform-independant.

it bothers me that we have an object and something that would give its address

Then just remember that we do not. CPython only optimises id under the (correct) assumption that the address is unique. You should never treat is as an address because it's not defined to be.


Why was this decision made?

Because if we were to access things from their id we'd be able to do all sorts of stupid stuff like accessing uninitialised stuff. It also prevents interpreters from optimising things by moving addresses around (JIT compilers like PyPy could not exist as easily if items had to have memory addresses). Furthermore, there is no guarantee that the item is either alive or even the same item at any point.

When references take less space than an integer (which is a reference + an numeric object) there is no point just not using a reference(or a weakref if preferred), which will always do the right thing.

Community
  • 1
  • 1
Veedrac
  • 58,273
  • 15
  • 112
  • 169
  • 2
    Moving objects in memory has nothing to do with JIT compilers. It's the concern of the garbage collector, and if CPython wasn't eternally wedded to its C API it too could move objects in memory. PyPy has features besides JIT compilation ;-) –  Sep 28 '13 at 11:30
  • I was under the opinion that when JIT-compiled objects get a lower-level representation, which would naturally have a different memory address. Keeping track of said address would add a lot of burden for little reward. Fair point either way. – Veedrac Sep 28 '13 at 11:33
  • 2
    Objects don't get JIT compiled. Code gets JIT compiled. If the compiled code allocates some object that it can prove doesn't escape the trace (e.g. a temporary integer), it avoids allocating it all. And sometimes parts of an object are not kept up-to-date all the time (virtalizables). But in any case, as soon as the code needs an address, the object *is* allocated and updated fully (or the optimization is not performed at all). PyPy already handles all sorts of things the JIT compiler can't handle (well). That's the beauty of tracing JIT compilers. –  Sep 28 '13 at 12:44