76

Experimenting with magic methods (__sizeof__ in particular) on different Python objects I stumbled over the following behaviour:

Python 2.7

>>> False.__sizeof__()
24
>>> True.__sizeof__()
24

Python 3.x

>>> False.__sizeof__()
24
>>> True.__sizeof__()
28

What changed in Python 3 that makes the size of True greater than the size of False?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Simon Fromme
  • 3,104
  • 18
  • 30
  • 3
    [Related](https://stackoverflow.com/questions/10365624/sys-getsizeofint-returns-an-unreasonably-large-value#comment77604598_10365639), the same behavior appears for `0` vs `1` – user3483203 Oct 26 '18 at 20:41
  • Note that assessing memory consumption using ``sys.getsizeof`` and ``__sizeof__`` (the later misses GC overhead) will lead to misleading results unless one really, really understands the Python interpreter. PyPy considers it an **error** to use any of these. For examples, integers 5 <= i <= 256 are singletons in CPython - ``[1, 1]`` and ``[1, 1, 1, 1]`` only differ by two additional pointers in size. In your case, you would have to find out whether ``True`` and ``1`` share the same memory for their value. – MisterMiyagi Oct 27 '18 at 08:57

4 Answers4

71

It is because bool is a subclass of int in both Python 2 and 3.

>>> issubclass(bool, int)
True

But the int implementation has changed.

In Python 2, int was the one that was 32 or 64 bits, depending on the system, as opposed to arbitrary-length long.

In Python 3, int is arbitrary-length - the long of Python 2 was renamed to int and the original Python 2 int dropped altogether.


In Python 2 you get the exactly same behaviour for long objects 1L and 0L:

Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof(1L)
28
>>> sys.getsizeof(0L)
24

The long/Python 3 int is a variable-length object, just like a tuple - when it is allocated, enough memory is allocated to hold all the binary digits required to represent it. The length of the variable part is stored in the object head. 0 requires no binary digits (its variable length is 0), but even 1 spills over, and requires extra digits.

I.e. 0 is represented as binary string of length 0:

<>

and 1 is represented as a 30-bit binary string:

<000000000000000000000000000001>

The default configuration in Python uses 30 bits in a uint32_t; so 2**30 - 1 still fits in 28 bytes on x86-64, and 2**30 will require 32;

2**30 - 1 will be presented as

<111111111111111111111111111111>

i.e. all 30 value bits set to 1; 2**30 will need more, and it will have internal representation

<000000000000000000000000000001000000000000000000000000000000>

As for True using 28 bytes instead of 24 - you need not worry. True is a singleton and therefore only 4 bytes are lost in total in any Python program, not 4 for every usage of True.

22

Both True and False are longobjects in CPython:

struct _longobject _Py_FalseStruct = {
    PyVarObject_HEAD_INIT(&PyBool_Type, 0)
    { 0 }
};

struct _longobject _Py_TrueStruct = {
    PyVarObject_HEAD_INIT(&PyBool_Type, 1)
    { 1 }
};

You thus can say that a Boolean is a subclass of a int where True takes as value 1, and False takes as value 0. We thus make a call to PyVarObject_HEAD_INIT with as type parameter a reference to PyBool_Type and with ob_size as value 0 and 1 respectively.

Now since , there is no long anymore: these have been merged, and the int object will, depending on the size of the number, take a different value.

If we inspect the source code of the longlobject type, we see:

/* Long integer representation.
   The absolute value of a number is equal to
        SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
   Negative numbers are represented with ob_size < 0;
   zero is represented by ob_size == 0.
   In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
   digit) is never zero. Also, in all cases, for all valid i,
        0 <= ob_digit[i] <= MASK.
   The allocation function takes care of allocating extra memory
   so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.
   CAUTION: Generic code manipulating subtypes of PyVarObject has to
   aware that ints abuse ob_size's sign bit.
*/

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

To make a long story short, an _longobject can be seen as an array of "digits", but you should here see digits not as decimal digits, but as groups of bits that thus can be added, multiplied, etc.

Now as is specified in the comment, it says that:

   zero is represented by ob_size == 0.

So in case the value is zero, no digits are added, whereas for small integers (values less than 230 in CPython), it takes one digit, and so on.

In , there were two types of representations for numbers, ints (with a fixed size), you could see this as "one digit", and longs, with multiple digits. Since a bool was a subclass of int, both True and False occupied the same space.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Thanks for the good anwer! If SO would allow me to accept multiple answers I would have accepted yours as well! Guess the next step to become a better Python programmer will be to familiarize myself with the important bits of the CPython source... – Simon Fromme Oct 29 '18 at 17:48
6

I haven't seen CPython code for this, but I believe this has something to do with optimization of integers in Python 3. Probably, as long was dropped, some optimizations were unified. int in Python 3 is arbitrary-sized int – the same as long was in Python 2. As bool stores in the same way as new int, it affects both.

Interesting part:

>>> (0).__sizeof__()
24

>>> (1).__sizeof__()  # Here one more "block" is allocated
28

>>> (2**30-1).__sizeof__()  # This is the maximum integer size fitting into 28
28

+ bytes for object headers should complete the equation.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Slam
  • 8,112
  • 1
  • 36
  • 44
  • 1
    Actually, now in Python 3, `int` is just what Python 2 `long` was, it was really `int` that was "dropped" – juanpa.arrivillaga Oct 26 '18 at 20:48
  • Internally – absolutely true, I'm talking about names, but thx for clarification – Slam Oct 26 '18 at 20:48
  • Indeed, in CPython 3 source code, it's still longobject – juanpa.arrivillaga Oct 26 '18 at 20:50
  • 1
    It is actually deoptimization... pessimization? – Antti Haapala -- Слава Україні Oct 26 '18 at 21:09
  • 1
    It's probably not. I may be wrong, but this overhead does not cost as much as two different types + seamless cast in runtime. The truth is, you rarely operate such a huge amount of ints in normal python program to care about this extra memory. If you do – you should use something like numpy to run with ~0 overhead and have pure int32/64 – no headers, no reallocation. But overflows from time to time ) – Slam Oct 26 '18 at 21:33
5

Take a look at the cpython code for True and False

Internally it is represented as integer

PyTypeObject PyBool_Type = {
        PyVarObject_HEAD_INIT(&PyType_Type, 0)
        "bool",
        sizeof(struct _longobject),
        0,
        0,                                          /* tp_dealloc */
        0,                                          /* tp_print */
        0,                                          /* tp_getattr */
        0,                                          /* tp_setattr */
        0,                                          /* tp_reserved */
        bool_repr,                                  /* tp_repr */
        &bool_as_number,                            /* tp_as_number */
        0,                                          /* tp_as_sequence */
        0,                                          /* tp_as_mapping */
        0,                                          /* tp_hash */
        0,                                          /* tp_call */
        bool_repr,                                  /* tp_str */
        0,                                          /* tp_getattro */
        0,                                          /* tp_setattro */
        0,                                          /* tp_as_buffer */
        Py_TPFLAGS_DEFAULT,                         /* tp_flags */
        bool_doc,                                   /* tp_doc */
        0,                                          /* tp_traverse */
        0,                                          /* tp_clear */
        0,                                          /* tp_richcompare */
        0,                                          /* tp_weaklistoffset */
        0,                                          /* tp_iter */
        0,                                          /* tp_iternext */
        0,                                          /* tp_methods */
        0,                                          /* tp_members */
        0,                                          /* tp_getset */
        &PyLong_Type,                               /* tp_base */
        0,                                          /* tp_dict */
        0,                                          /* tp_descr_get */
        0,                                          /* tp_descr_set */
        0,                                          /* tp_dictoffset */
        0,                                          /* tp_init */
        0,                                          /* tp_alloc */
        bool_new,                                   /* tp_new */
    };

    /* The objects representing bool values False and True */

    struct _longobject _Py_FalseStruct = {
        PyVarObject_HEAD_INIT(&PyBool_Type, 0)
        { 0 }
    };

    struct _longobject _Py_TrueStruct = {
        PyVarObject_HEAD_INIT(&PyBool_Type, 1)
    { 1 }
Kamil Niski
  • 4,580
  • 1
  • 11
  • 24
  • I think i just was not as quick to answer properly as others :) – Kamil Niski Oct 26 '18 at 21:00
  • 2
    The answer only gets halfway there. It's represented as an integer.. yeah? And so what? – wim Oct 26 '18 at 21:07
  • @wim I don't see the need to explain it further since guys above already did great job at this. I don't want to duplicate content. Please refer to their answers. – Kamil Niski Oct 26 '18 at 21:10
  • agree, it's not good to duplicate content in others answers (in these cases, usually best just to self-delete answer). Was just attempting to explain why it may have been downvoted. – wim Oct 26 '18 at 21:14
  • @wim Downvotes appeared when i posted first version of the answer, someone probably disliked it. Since my answer is incomplete and i don't want duplicating the better answers should I delete mine? – Kamil Niski Oct 26 '18 at 21:19
  • 5
    I think so, but it's up to you. Should you wait for it to maybe get +3 and then delete you may earn a [disciplined](https://stackoverflow.com/help/badges/37/disciplined) badge... :) – wim Oct 26 '18 at 22:04