Why do ints require three times as much memory in Python?

Question

On a 64-bit system an integer in Python takes 24 bytes. This is 3 times the memory that would be needed in e.g. C for a 64-bit integer. Now, I know this is because Python integers are objects. But what is the extra memory used for? I have my guesses, but it would be nice to know for sure.

See this article: http://www.laurentluce.com/posts/python-integer-objects-implementation/ and also https://docs.python.org/2/c-api/structures.html for common object structures in Python — DNA, Apr 11 '14 at 15:26
@DNA: that talks about the Python 2 basic `int` type; the `long` type in Python 2 (replacing the `int` type in Python 3) is a little more complicated still. — Martijn Pieters, Apr 11 '14 at 15:40
Note that this question and its answers are specific to the CPython reference implementation. Other implementations could have entirely different memory usages (though, the same general principles apply: the size and other object metadata need to be stored). — Bob, Apr 11 '14 at 22:44

score 42 · Accepted Answer · edited Nov 25 '21 at 05:08

42

Remember that the Python int type does not have a limited range like C int has; the only limit is the available memory.

Memory goes to storing the value, the current size of the integer storage (the storage size is variable to support arbitrary sizes), and the standard Python object bookkeeping (a reference to the relevant object and a reference count).

You can look up the longintrepr.h source (the Python 3 int type was traditionally known as the long type in Python 2); it makes effective use of the PyVarObject C type to track integer size:

struct _longobject {
        PyObject_VAR_HEAD
        digit ob_digit[1];
};

The ob_digit array stores 'digits' of either 15 or 30 bits wide (depending on your platform); so on my 64-bit OS X system, an integer up to (2 ^ 30) - 1 uses 1 'digit':

>>> sys.getsizeof((1 << 30) - 1)
28

but if you use 2 30-bit digits in the number an additional 4 bytes are needed, etc:

>>> sys.getsizeof(1 << 30)
32
>>> sys.getsizeof(1 << 60)
36
>>> sys.getsizeof(1 << 90)
40

The base 24 bytes then are the PyObject_VAR_HEAD structure, holding the object size, the reference count and the type pointer (each 8 bytes / 64 bits on my 64-bit OS X platform).

On Python 2, integers <= sys.maxint but >= -sys.maxint - 1 are stored using a simpler structure storing just the single value:

typedef struct {
    PyObject_HEAD
    long ob_ival;
} PyIntObject;

because this uses PyObject instead of PyVarObject there is no ob_size field in the struct and the memory size is limited to just 24 bytes; 8 for the long value, 8 for the reference count and 8 for the type object pointer.

edited Nov 25 '21 at 05:08

wjandrea

28,235
9
60
81

answered Apr 11 '14 at 15:23

Martijn Pieters

1,048,767
296
4,058
3,343

1

How are negative values handled, if an int is given as an sequence of digits? Is there a concept of twos complement in python? If I print hex(-1) I get -0x1 or similarly if I print bin(-1) I get -0b1 I understand that this may not be what is represented internally however how does python make the decision that it is a negative value if the high bit is not set? – Har Oct 08 '16 at 12:02
1

@Har: the object size is set to a negative value. see the [linked header file](https://hg.python.org/cpython/file/5e303360db14/Include/longintrepr.h#l74): *Negative numbers are represented with ob_size < 0;*. So an integer representation that requires 2 `ob_digits` entries, then `ob_size` is either `2` or `-2`, the latter signalling it is a negative integer. – Martijn Pieters Oct 08 '16 at 13:21
1

so that means that it is not a twos complement it is simply a bit in the structure which represents whether it is negative or not? – Har Oct 08 '16 at 15:18
2

@Har: exactly; the internal representation does not use 2s complement. – Martijn Pieters Oct 08 '16 at 15:21
What happens with the remaining 1 or 2 bits in each 'digit'? (Since 16 or 32 bits are used to store digits of only 15 or 30 bits) – PieterNuyts Mar 27 '19 at 08:44
I'm guessing that in addition to the >= 24 bytes needed to store the object, you will need an additional 8 bytes (on a 64-bit system) to store a pointer to it, right? – PieterNuyts Mar 27 '19 at 08:51
@PieterNuyts: those bits are ignored. It's just that for specific arbitrary precision integer algorithms, either 15 or 30 bits are the best sizes that fit the [constraints that are documented in the source](https://github.com/python/cpython/blob/101ddba62d91705149c73b2aad6aad3fe305d58f/Include/longintrepr.h#L11-L42). – Martijn Pieters Mar 27 '19 at 12:04
@PieterNuyts: any references *to* integer objects are counted elsewhere. There can be any number of references to a Python object. So list objects indeed take 8 bytes per element in the list, because all a list holds are pointers to Python objects. Global variables are stored in a `dict`, and keys and values in a dict are pointers too. Locals are stored in an array of pointers, etc. – Martijn Pieters Mar 27 '19 at 12:06
@MartijnPieters Agreed, but if you compare to e.g. C++, not only does a C++ integer only require 8 bytes instead of 24, but also you don't _have_ to use another 8 bytes to store a pointer to it (though of course you can). So the total overhead for storing a single integer number compared to C++ is at least 24 bytes (on a 64-bit system), not 16. – PieterNuyts Apr 03 '19 at 13:09
@PieterNuyts: you need to compare this with C++ objects that live on the *heap*, not on the stack. E.g. with objects that require a pointer on the stack to keep track of them. – Martijn Pieters Apr 03 '19 at 13:51
@MartijnPieters Depends what you want to do. If you want objects on the heap, you should compare C++ heap objects with Python objects. If you want objects on the stack, you should compare C++ stack objects with Python stack objects. Since the latter don't exist, they need to be replaced by heap objects, so I think it makes perfect sense to account for the complete overhead that involves. – PieterNuyts Apr 11 '19 at 07:33
@AlwaysLearning: `digit ob_digit` is not an array, that's a single value. `digit` is the type of a single digit. The `_longobject` is the *whole Python int object holding the digits* (the `_longobject` definition is reused for both the int and the bool types, the `PyLongObject *result` variable is a `_longobject` struct, via `typedef struct _longobject PyLongObject` elsewhere). The malloc allocates memory for `result` based on the offset for `ob_digit` (everything before `ob_digit` fits into that) plus the number of `digit` values needed. – Martijn Pieters Nov 28 '22 at 09:19
@AlwaysLearning: perhaps the [GDB proxy definition for `PyLongObject`](https://github.com/python/cpython/blob/1e197e63e21f77b102ff2601a549dda4b6439455/Tools/gdb/libpython.py#L879-L921) can help, that's Python code used to show the value of the object when using the GDB debugger. – Martijn Pieters Nov 28 '22 at 09:23
@MartijnPieters I still don't get it. Namely, declaring something as an array (`digit ob_digit[1]`) is saying that automatic memory allocation is used. If `struct _longobject` contains all the digits, then we need dynamic memory allocation and it should say `digit *ob_digit`... – AlwaysLearning Nov 30 '22 at 08:47
@AlwaysLearning: no, `digit ob_digit[1]` is not saying automatic memory allocation is used, sorry. This is veering very far away from the subject of this question, however. If you want to dig into this subject, then perhaps you can post a question about that. – Martijn Pieters Nov 30 '22 at 15:19
why in my python complier ,of version 3.10 in 64 bit machine , uses 24 bytes for 0 and 28 bytes for 1. What I essentially asking is how 0 is not taking extra space but one. is python have special case for 0 as I can see till 2**63-1 the size is 28 only. – novice Mar 31 '23 at 13:25
@novice: `0` is a python `longobject` structure storing zero digits, `1` stores a single digit. So, `1` needs 4 more bytes to store that digit. – Martijn Pieters Jun 10 '23 at 21:28

score 4 · Answer 2 · answered Apr 11 '14 at 16:11

From longintrepr.h, we see that a Python 'int' object is defined with this C structure:

struct _longobject {
        PyObject_VAR_HEAD
        digit ob_digit[1];
};

Digit is a 32-bit unsigned value. The bulk of the space is taken by the variable size object header. From object.h, we can find its definition:

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

We can see that we are using a Py_ssize_t, 64-bits assuming 64-bit system, to store the count of "digits" in the value. This is possibly wasteful. We can also see that the general object header has a 64-bit reference count, and a pointer to the object type, which will also be a 64-bits of storage. The reference count is necessary for Python to know when to deallocate the object, and the pointer to the object type is necessary to know that we have an int and not, say, a string, as C structures have no way to test the type of an object from an arbitrary pointer.

_PyObject_HEAD_EXTRA is defined to nothing on most builds of python, but can be used to store a linked list of all Python objects on the heap if the build enables that option, using another two pointers of 64-bits each.

Why do ints require three times as much memory in Python?

2 Answers2

Linked

Related