30

I was playing around with sys's getsizeof() and found that False (or 0) consists of less bytes than True (or 1). Why is that?

import sys

print("Zero: " + str(sys.getsizeof(0)))
print("One: " + str(sys.getsizeof(1)))
print("False: " + str(sys.getsizeof(False)))
print("True: " + str(sys.getsizeof(True)))

# Prints:
# Zero: 24
# One: 28
# False: 24
# True: 28

In fact, other numbers (also some that consist of more than one digit) are 28 bytes.

for n in range(0, 12):
  print(str(n) + ": " + str(sys.getsizeof(n)))

# Prints:
# 0: 24
# 1: 28
# 2: 28
# 3: 28
# 4: 28
# 5: 28
# 6: 28
# 7: 28
# 8: 28
# 9: 28
# 10: 28
# 11: 28

Even more: sys.getsizeof(999999999) is also 28 bytes! sys.getsizeof(9999999999), however, is 32.

So what's going on? I assume that the booleans True and False are internally converted to 0 and 1 respectively, but why is zero different in size from other lower integers?

Side question: is this specific to how Python (3) represents these items, or is this generally how digits are presented in the OS?

Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
  • 2
    On python2.7, I get "24" for all of these. I reproduced your results on 3.6. This appears to be python3 specific. – jordanm Apr 19 '18 at 17:29
  • 3
    It's very Python-specific. Most programming languages don't have arbitrary-precision integers, you have to choose data types like `int`, `long`, `long long`, etc. and they each have fixed size. – Barmar Apr 19 '18 at 17:32
  • @jordanm It's not quite Python 3 specific—you see the same thing in Python 2 if you compare `0L`, `1L`, etc. The difference is that Python 2 had separate types for `int` (fixed-size 32-bit signed integer) and `long` (arbitrary-sized integer), while Python 3 renamed `long` to `int` and got rid of `int`. – abarnert Apr 19 '18 at 17:39

1 Answers1

32

Remember that Python int values are of arbitrary size. How does that work?

Well, in CPython,1 an int is represented by a PyLong_Object, which has an array of 4-byte chunks2, each holding 30 bits3 worth of the number.

  • 0 takes no chunks at all.
  • 1 - (1<<30)-1 takes 1 chunk.
  • 1<<30 - (1<<60)-1 takes 2 chunks.

And so on.

This is slightly oversimplified; for full details, see longintrepr.h in the source.


In Python 2, there are two separate types, called int and long. An int is represented by a C 32-bit signed integer4 embedded directly in the header, instead of an array of chunks. A long is like a Python 3 int.

If you do the same test with 0L, 1L, etc., to explicitly ask for long values, you will get the same results as in Python 3. But without the L suffix, any literal that fits in 32 bits gives you an int, and only literals that are too big give you longs.5 (This means that (1<<31)-1 is an int, but 1<<31 is a 2-chunk long.)


1. In a different implementation, this might not be true. IIRC, Jython does roughly the same thing as CPython, but IronPython uses a C# "bignum" implementation.

2. Why 30 bits instead of 32? Mainly because the implementation of pow and ** can be simpler and faster if it can assume that the number of bits in two "digits" is divisible by 10.

3. It uses the C "struct hack". Technically, a Py_LongObject is 28 bytes, but nobody ever allocates a Py_LongObject; they malloc 24, 28, 32, 36, etc. bytes then cast to Py_LongObject *.

4. In fact, a Python int is a C long, just to make things confusing. So the C API is full of things like PyInt_FromLong where the long means "32-bit int" and PyLong_FromSize_t where the long means "bignum".

5. Early versions of Python 2.x didn't integrate int and long as nicely, but hopefully nobody has to worry about those anymore.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • In very layman terms, would it make sense to say that the empty `PyLong_Object` takes 24 bytes. A `PyLong_Object` with an empty array is the same as zero. As soon as you actually have an int > 0 the object's array isn't empty anymore, therefore its array of 4-byte chunks gets filled, explaining the increase in size? – Bram Vanroy Apr 20 '18 at 07:52
  • @BramVanroy Yes, that's right (to understand it any deeper, you need to know about the C ["struct hack"](https://stackoverflow.com/questions/16553542)). – abarnert Apr 21 '18 at 00:05