107

I am writing Python code to do some big number calculation, and have serious concern about the memory used in the calculation.

Thus, I want to count every bit of each variable.

For example, I have a variable x, which is a big number, and want to count the number of bits for representing x.

The following code is obviously useless:

x=2**1000
len(x)

Thus, I turn to use the following code:

x=2**1000
len(repr(x))

The variable x is (in decimal) is:

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

but the above code returns 303

The above long long sequence is of length 302, and so I believe that 303 should be related to the string length only.

So, here comes my original question:

How can I know the memory size of variable x?

One more thing; in C/C++ language, if I define

int z=1;

This means that there are 4 bytes= 32 bits allocated for z, and the bits are arranged as 00..001(31 0's and one 1).

Here, my variable x is huge, I don't know whether it follows the same memory allocation rule?

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
user4478
  • 1,289
  • 2
  • 11
  • 13
  • I just found that sys.getsizeof(x) seems to be useful? I use sys.getsizeof(x), where x=2**1000 and such instruction returns 160. Does this means that x occupies 160 bytes? Or actually 160 bits? – user4478 Jan 17 '13 at 04:03
  • 5
    Unless you're dealing with extremely low-level hardware implementation, no one measures anything in bits. For most all intents and purposes the lowest unit of computing is the `byte`. – Jonathon Reinhart Jan 17 '13 at 04:05

2 Answers2

198

Use sys.getsizeof to get the size of an object, in bytes.

>>> from sys import getsizeof
>>> a = 42
>>> getsizeof(a)
12
>>> a = 2**1000
>>> getsizeof(a)
146
>>>

Note that the size and layout of an object is purely implementation-specific. CPython, for example, may use totally different internal data structures than IronPython. So the size of an object may vary from implementation to implementation.

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • thank you, so how is the memory arrangement? – user4478 Jan 17 '13 at 04:04
  • You can't know. That is an implementation detail of the version of python you're using. It all depends on how they manage arbitrarily-sized integers internally. – Jonathon Reinhart Jan 17 '13 at 04:06
  • Thank you! Although the memory arrangement is version-dependent, do you have any idea or website suggestions that have some more such memory implementation details? – user4478 Jan 17 '13 at 04:08
  • 2
    You can start with http://www.python.org/dev/peps/pep-0237/ – gpoo Jan 17 '13 at 04:09
  • 2
    http://stackoverflow.com/questions/1331471/in-memory-size-of-python-stucture shows the sizes on the different python versions. – Nicolas Brown Jan 17 '13 at 04:09
  • @user4478 [This website](http://hg.python.org/cpython/file/6d06b223c664/Include/longintrepr.h#l75) might be of use. – user4815162342 Jan 17 '13 at 04:10
  • Hi Jonathon! Why minimum size of a char in python is 25 bytes. `>>> getsizeof('a')` gives `25` and `>>> getsizeof('ab')` gives `26` **?** – Grijesh Chauhan Jan 17 '13 at 05:15
  • There's of course bookkeeping information associated with that string. There are at least two or three pointers, and maybe other metadata associated with it. Again, you'd have to look at the implementation to fully understand why. There are links in the comments above that should point you in the right direction. – Jonathon Reinhart Jan 17 '13 at 05:18
10

Regarding the internal structure of a Python long, check sys.int_info (or sys.long_info for Python 2.7).

>>> import sys
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)

Python either stores 30 bits into 4 bytes (most 64-bit systems) or 15 bits into 2 bytes (most 32-bit systems). Comparing the actual memory usage with calculated values, I get

>>> import math, sys
>>> a=0
>>> sys.getsizeof(a)
24
>>> a=2**100
>>> sys.getsizeof(a)
40
>>> a=2**1000
>>> sys.getsizeof(a)
160
>>> 24+4*math.ceil(100/30)
40
>>> 24+4*math.ceil(1000/30)
160

There are 24 bytes of overhead for 0 since no bits are stored. The memory requirements for larger values matches the calculated values.

If your numbers are so large that you are concerned about the 6.25% unused bits, you should probably look at the gmpy2 library. The internal representation uses all available bits and computations are significantly faster for large values (say, greater than 100 digits).

casevh
  • 11,093
  • 1
  • 24
  • 35