0

I've been experimenting with sys.getrefcount() and noticed a strange thing. Why does this script output 108 instead of expected 2 (single reference + argument to getrefcount())?

import sys
x = 1
print(sys.getrefcount(x))  # 108
planetp
  • 14,248
  • 20
  • 86
  • 160
  • Maybe Python has 106 built-in variables? – TheTechRobo the Nerd May 11 '20 at 20:21
  • 3
    **variables** do not have reference counts, **objects** do. As an implementation detail, CPython caches small integers. Apparently, the integer object `1` is referenced that many times - which isn't surprising. Try it with something like `1000` – juanpa.arrivillaga May 11 '20 at 20:21
  • The number 1 is probably used lots of times inside the interpreter. I suppose you get much lower refcounts if you use a more "unusual" number like 7. – mkrieger1 May 11 '20 at 20:23
  • 1
    @mkrieger1 - python caches small integers. You'll need to go above 256. – tdelaney May 11 '20 at 20:25
  • 3
    No, I mean that despite being cached, 7 will have a lower refcount because it is used more rarely than 1. – mkrieger1 May 11 '20 at 20:31
  • Try a float number, `sys.getrefcount(0.5642) = 3`. I'd like to know why nothing I try goes below `3` though. – Paddy Harrison May 11 '20 at 20:46
  • @PaddyHarrison in my testing, lists will generally yield only `1` reference, including the empty list. However, the empty tuple has over a thousand references (as does `None`) :) I also get only `2` references for negative integers `-6` and below. – Karl Knechtel May 11 '20 at 20:59
  • @KarlKnechtel my system agrees with yours 100%! I wonder whether it's OS dependent, are you also running MacOS? – Paddy Harrison May 11 '20 at 21:03
  • No, I'm on Win10. I think it probably depends more on the Python version and on details like whether Tkinter is installed. – Karl Knechtel May 11 '20 at 21:16

1 Answers1

1

The exact numbers will depend on the details of your installation. For me:

Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)]
 on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getrefcount(1)
101
>>> x = 1
>>> sys.getrefcount(x)
102
>>> sys.getrefcount(1)
102
>>> sys.getrefcount(2)
77
>>> sys.getrefcount(3)
32
>>> x = 256
>>> y = 256
>>> x is y
True
>>> x = 256
>>> y = 257
>>> x is y
False
>>> sys.getrefcount(255)
4
>>> sys.getrefcount(256)
19
>>> sys.getrefcount(257)
3
>>> x = -5
>>> y = -5
>>> x is y
True
>>> x = -6
>>> y = -6
>>> x is y
False
>>> sys.getrefcount(-5)
3
>>> sys.getrefcount(-6)
2

The example is constructed so as to highlight:

  • It is not x that has a reference count, but 1. Variable names don't have references; they are the references. The referred-to things are the values - under-the-hood chunks of memory that store the information needed to represent concepts like 1.

  • Even if it made sense to talk about references counts of variables, sys.getrefcount - like any other function - will receive a value, not a variable as its argument. Even if you write sys.getrefcount(x), there is no way for the internals of sys.getrefcount to actually know about x.

  • Different integer values will have varying reference counts. This is because a bunch of stuff is set up ahead of time, and that setup requires internal stuff to store numeric values, producing references to the objects representing those values.

  • The internal logic of the reference C implementation of Python ensures that, for small integers (apparently, -5 to 256 inclusive, which matches my recollection - but do not rely on this behaviour; it is at best not useful for any real purpose and generally a really good way to introduce bugs and confusion into your code), the same object is used to represent a given value everywhere it's used (when the value 1 results from a computation, the existing 1 object is returned, instead of creating a new object representing the same value). So values inside this range have at least one extra reference, because there is something keeping track of those objects for recycling purposes that needs to track them. They may, of course, have a lot more references; evidently the number 256 is more important than the number 255 for the internal implementation details.

  • However, even numbers outside that range will report having two or three references. This is, I am pretty sure, something to do with how the garbage collector works (handwaving wildly). I have no idea why positive integers appear to report one more reference than negative ones.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153