5

This is an attempt to better understand how reference count works in Python.

Let's create a class and instantiate it. The instance's reference count would be 1 (getrefcount displays 2 because it's own internal structures reference that class instance increasing reference count by 1):

>>> from sys import getrefcount as grc
>>> class A():
    def __init__(self):
        self.x = 100000


>>> a = A()
>>> grc(a)
2

a's internal variable x has 2 references:

>>> grc(a.x)
3

I expected it to be referenced by a and by A's __init__ method. Then I decided to check.

So I created a temporary variable b in the __main__ namespace just to be able to access the variable x. It increased the ref-number by 1 for it to become 3 (as expected):

>>> b = a.x
>>> grc(a.x)
4

Then I deleted the class instance and the ref count decreased by 1:

>>> del a
>>> grc(b)
3

So now there are 2 references: one is by b and one is by A (as I expected).

By deleting A from __main__ namespace I expect the count to decrease by 1 again.

>>> del A
>>> grc(b)
3

But it doesn't happen. There is no class A or its instances that may reference 100000, but still it's referenced by something other than b in __main__ namespace.

So, my question is, what is 100000 referenced by apart from b?


BrenBarn suggested that I should use object() instead of a number which may be stored somewhere internally.

>>> class A():
    def __init__(self):
        self.x = object()


>>> a = A()
>>> b = a.x
>>> grc(a.x)
3
>>> del a
>>> grc(b)
2

After deleting the instance a there were only one reference by b which is very logical.

The only thing that is left to be understood is why it's not that way with number 100000.

ovgolovin
  • 13,063
  • 6
  • 47
  • 78

2 Answers2

3

a.x is the integer 10000. This constant is referenced by the code object corresponding to the __init__() method of A. Code objects always include references to all literal constants in the code:

>>> def f(): return 10000
>>> f.__code__.co_consts
(None, 10000)

The line

del A

only deletes the name A and decreases the reference count of A. In Python 3.x (but not in 2.x), classes often include some cyclic references, and hence are only garbage collected when you explicitly run the garbage collector. And indeed, using

import gc
gc.collect()

after del A does lead to the reduction of the reference count of b.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Shouldn't deleting `A` lead to deleting `__init__` as it's only referenced by `A` (as I understand)? – ovgolovin Jun 03 '12 at 22:51
  • Oh. I see now. Reference count of `A` is `4` before deleting from `__main__`. So this deleting will only reduce it by `1`. What is the other objects which reference `A`? – ovgolovin Jun 03 '12 at 23:05
  • @ovgolovin: I just noticed I cannot reproduce your results. For me, the clas `A` actually *does* get deleted, and the reference count of `b` *does* decrease. The only other thing I can think of: Be careful with the `_` of the interactive interpreter. When in doubt, better do experiments with reference counts in a script rather than in the interactive interpreter. – Sven Marnach Jun 03 '12 at 23:09
  • I use Python 3.2, if it matters. – ovgolovin Jun 03 '12 at 23:10
  • I checked. Python 3.2 prints `5` as the result of [this code](http://ideone.com/SuEri). And Python 2.6 prints `2`. – ovgolovin Jun 03 '12 at 23:18
  • And, if I understand it right, in the case of Python 3.2, constant `10000` is referenced by `__init__` code object of `A` which is not garbage collected as it has 3 references apart from `A` in `__main__`. – ovgolovin Jun 03 '12 at 23:19
  • Could you please write a few words about code object that references constants in function definitions (which explains the difference between using `100000` and `object()`). I think it may be useful for those who will be reading this thread. – ovgolovin Jun 03 '12 at 23:23
  • 1
    @ovgolovin: Code objects also reference constants in Python 2.x. The difference is that there appear back-references to the class object somewhere in Python 3.x (probably in the MRO) that do not exist in 2.x, and those back-references prevent immediate garbage collection. – Sven Marnach Jun 03 '12 at 23:25
  • And what comments should I leave? – ovgolovin Jun 03 '12 at 23:32
  • I usually read all the comments if the topic is interesting, even those which were later disproved (because it lets me to understand the logic and the source of mistakes and learn something even from it). – ovgolovin Jun 03 '12 at 23:37
  • @ovgolovin: Well, fine. :) I try to clean up really useless stuff, also to improve search engine results. – Sven Marnach Jun 03 '12 at 23:49
2

It's likely that this is an artifact of your using an integer as your test value. Python sometimes stores integer objects for later re-use, because they are immutable. When I run your code using self.x = object() instead (which will always create a brand-new object for x) I do get grc(b)==2 at the end.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • You are right! Changing `100000` to `object()` altered the numbers returned to `getrefcount`. I'll update the question. – ovgolovin Jun 03 '12 at 22:56
  • @ovgolovin: Of course it does, because the code object can only reference *constants* appearing in the code. This is completely unrelated to any integer object reuse, though. Integers are not randomly cached for later reuse. Python only holds a cache of small integers (usually from -5 to 256), which are created at interpreter start and used whenever necessary. All other integers are created on demand and never reused. – Sven Marnach Jun 03 '12 at 22:59
  • Downvoting: While the observation is accurate, the reason given in this answer is wrong. – Sven Marnach Jun 03 '12 at 23:01
  • The difference is that `object()` is only called when **instance** of `A` is initialized. But `100000` is referenced by `A`'s `__init__` code object. So the reference counts are different. See **Sven Marnach**'s answer for elaborate explanation. – ovgolovin Jun 03 '12 at 23:44