4

From this question (How big can a 64bit signed integer be?), I learned that the biggest possible number to work with on a 64-bit machine is 2^64-1, which is 92,233,720,368,547,758,070. That means, even if I add 1 to it, it should return inf. But it's not showing inf. Here's what I'm observing:

>>> max = sys.maxsize
>>> format(max, ',')
'9,223,372,036,854,775,807'
>>> a = max * 10
>>> format(a, ',')
'92,233,720,368,547,758,070'
>>> a / max
10.0

Even if for some reason 92,233,720,368,547,758,070 is not the biggest number for Python, then what is the use of sys.maxsize?

Secondly, shouldn't a 64-bit number take 64-bit memory space? Why both max and a are taking 36 bytes?

>>> sys.getsizeof(max)
36
>>> sys.getsizeof(a)
36

Can anyone please describe both of the confusion?

Fahim
  • 308
  • 1
  • 3
  • 10
  • 2
    python integers can be arbitrarily long (and they are [python objects](https://docs.python.org/3/c-api/structures.html#c.PyObject) with a reference count etc - therefore they take up more space than just 64 bit)... `sys.maxsize` is still relevant - you will only be able to address a list/tuple up to that size (well - you'll run out of memory long before that limit...). – hiro protagonist Sep 16 '20 at 14:36
  • Does this answer your question? [Handling very large numbers in Python](https://stackoverflow.com/questions/538551/handling-very-large-numbers-in-python) – Chase Sep 16 '20 at 14:37
  • Then why this question (https://stackoverflow.com/questions/6003492/how-big-can-a-64bit-signed-integer-be) saying the biggest number is `92,233,720,368,547,758,070`? How is `sys.maxsize` still relevant? Please show an example. – Fahim Sep 16 '20 at 14:52
  • Python's integers are not a native 64-bit signed numbers. The size limit of native 64-bit numbers is thus irrelevant. – MisterMiyagi Sep 16 '20 at 14:53
  • And thus the `sys.maxsize` should also be irrelevant, right? Because the [documentation](https://docs.python.org/3/library/sys.html#sys.maxsize) itself talks about the 64-bit limit. – Fahim Sep 16 '20 at 14:56
  • That's the limit for ``Py_ssize_t``, which is not a Python ``int``. – MisterMiyagi Sep 16 '20 at 14:56
  • So it means that the `indexing` cannot exceed the 64-bit limit? – Fahim Sep 16 '20 at 15:01
  • On CPython, indexing and any other operation expressed as ``Py_ssize_t`` is restricted to the machine word size – i.e. 64-bit or 32-bit. Note that in practice this is one bit less than 64-bit/32-bit, because the number is sized. See [``__len__`` can't return big numbers](https://stackoverflow.com/questions/60340710/len-cant-return-big-numbers) for some background. – MisterMiyagi Sep 16 '20 at 15:31
  • 2
    For the record, Python 2 had both `sys.maxint` and `sys.maxsize`. Now the latter doesn't have much to do with integers, it's referring to the largest `Py_ssize_t` on platform - no longer really relevant in day to day Python programming unless you write C extensions. – wim Sep 16 '20 at 15:44
  • That helps. Can you please describe the `Py_ssize_t` thing? I'm trying to understand it but everything written about it as very tough to understand. – Fahim Sep 16 '20 at 15:58
  • ``Py_ssize_t`` is basically the POSIX|C|C++ ``ssize_t`` (and you are likely to find more information looking for this instead). It is a *signed* indicator for *sized* containers – both the size of the container and position of elements. Being signed is very important in Python, since negative indices are common and well-defined. – MisterMiyagi Sep 16 '20 at 16:11

1 Answers1

6

Integers as Digit-Arrays

Python 3 (CPython) integers are not native machine integers. Logically, each integer consists of its sign and an absolute number in base 1073741824 (30-bit) or 32768 (15-bit) [*] - the latter being a variable-size array of unsigned integers. To store larger numbers, an additional "digit" is added to the array.

>>> sys.getsizeof(0)          # largest  0-digit number
24
>>> sys.getsizeof(1)          # smallest 1-digit number
28
>>> sys.getsizeof(2**30 - 1)  # largest  1-digit number
28
>>> sys.getsizeof(2**30)      # smallest 2-digit number
32
>>> sys.getsizeof(2**60 - 1)  # largest  2-digit number
32

Loosely speaking, this is the same mechanism as one adds digits when writing out decimal numbers – using 1-digit is enough up to 9, 2-digit up to 99, and so on. Likewise, as long as the computer has memory to "add a digit" a Python integer of larger size can be defined.

[*] The digits are 30-bit/15-bit instead of 32-bit/16-bit because this better fits some algorithms. For example, long_pow() requires a size divisible by 5.

Objects Header for Integers

Practically, integers are also objects – meaning they hold metadata such as type and reference count - which also takes up space. In CPython, an int consists of:

  • reference counter of Py_ssize_t
  • pointer to the type of PyTypeObject*
  • digit count of Py_ssize_t
  • variable array of digits of digit[]

where the first three are the structure of every variable size object. The sign is encoded inside the digit count.

On a 64-bit machine, both Py_ssize_t and PyTypeObject* are 8 byte in size – giving the "0-digit integer" 0 a size of 3*8 bytes or 24 bytes.

>>> sys.getsizeof(0)          # largest  0-digit number
24

So what is sys.maxsize?

The meaning of sys.maxsize is not the maximum integer size, but the maximum container size:

>>> len(range(sys.maxsize))    # this is fine
9223372036854775807
>>> len(range(sys.maxsize+1))  # this is one too much
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: range() result has too many items

This is a direct result of sys.maxsize expressing the maximum value of Py_ssize_t, the type used by the CPython runtime to represent and address memory. While this might seem like an arbitrary restriction, it is actually significantly more than what computers can address.

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
  • 1
    Writing "as of CPython3.9" makes it sound as though something has changed here recently, but there has not been anything for years afaik (maybe not even since python 3.0) – wim Sep 16 '20 at 15:50
  • @wim Appreciate the feedback, but lack a better wording. I don't want to say "as of CPython3.0" since to me that implies it was like this "back then". Should I just drop the version reference? – MisterMiyagi Sep 16 '20 at 15:53
  • 1
    Yeah, I think so. Just "In CPython, an int..." – wim Sep 16 '20 at 15:54
  • Would you kindly describe the `Py_ssize_t` thing? I'm trying to understand it but couldn't get it yet. – Fahim Sep 16 '20 at 15:59
  • @Fahim if you were used to C or C++ it would make more sense. It's an integer type that's big enough to hold the size of any possible object in memory. For a 64-bit system it would be 64 bits. – Mark Ransom Sep 16 '20 at 16:10
  • ``Py_ssize_t`` is the type used to store container sizes and element positions in CPython. It is derived from the POSIX|C|C++ ``ssize_t``. While being signed restricts the maximum size of containers (on a 64-bit machine to the ludicrous 8EiB instead of ludicrous 16 EiB), it allows to safely work with and store negative indices. Negative indices are frequently used in Python, and as such important to support correctly. – MisterMiyagi Sep 16 '20 at 16:21
  • There is a good novice-level explanation about the size type from Tim Peters [here](https://stackoverflow.com/questions/20987390/cython-why-when-is-it-preferable-to-use-py-ssize-t-for-indexing/20987522#20987522). – wim Sep 16 '20 at 17:06