3

I have a single script as below:

a = 999999999999999999999999999999
b = 999999999999999999999999999999
print(a is b)

Output is:

[root@centos7-sim04 python]# python test2.py
True

On the other hand, the same code with command line:

>>> a = 999999999999999999999999999999
>>> b = 999999999999999999999999999999
>>> print(a is b)
False

The output is False.

  • What is the difference between the 2 ways?
  • When python script running, how does Python VM to manage the integer objects?
  • I see that the number from -5 to 256 is generated by VM automatically when VM started and VM will also assign the empty int blocks(chain struct) to avoid allocating memory frequently for large number storage.
  • Will these blocks be released automatically by python VM when memory is not enough? For my understanding, Python just keeps these blocks to avoid allocating memory frequently so that them will never be released automatically once allocated?

Just test with following code:

for i in range(1, 100000000):
    pass
print("OK")
gc.collect()
time.sleep(20)
print("slept")
for i in range(1, 100000000)
    pass

The memory is:

PID   USER      PR   NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
17351 root      20   0    3312060 3.039g 2096 S  11.3 82.4   0:03.53 python

Here is the result of vmstat:

[root@centos7-sim04 ~]# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
1  0      0 3376524    40 330084    0    0     2     5   25   41  0  0 100 0  0
1  0      0 185644     40 330084    0    0     0     0  714   28 14  3 82  1  0
0  0      0 967420     40 330084    0    0     0     0  292   15  7  0 93  0  0
0  0      0 967296     40 330084    0    0     0     0   20   23  0  0 100 0  0
0  0      0 967296     40 330084    0    0     0     0   15   17  0  0 100 0  0
0  0      0 967312     40 330068    0    0     0     1   27   39  0  0 100 0  0
1  0      0 185288     40 330068    0    0     0     2  701   55 17  0 83  0  0
0  0      0 3375780    40 330068    0    0     0     0  202   75  3  1 96  0  0
  • It seems that the memory is never released. Is this right?
  • If I want to release the integer objects when memory is not enough, how can I do?

Thanks a lot.

=============================== Update ===============================

range() in Python2 returns the full list which keeps all the items and xrange() in Python2 returns a generator. xrange() is the range() function in Python3.

Here is the link for generator in python and also the PEP link,

https://wiki.python.org/moin/Generators & https://www.python.org/dev/peps/pep-0255/

Mazdak
  • 105,000
  • 18
  • 159
  • 188
qingdaojunzuo
  • 416
  • 1
  • 3
  • 13
  • Are you using Python 2 or 3? If Python 2, your `range(1, 100000000)` call generated a list which will explain a large chunk of memory used. Under Python 3 the process took only 8MB of memory. – metatoaster Sep 11 '16 at 09:43
  • First of all, Python does not run in a VM like Java or C# do. Please notice that Python does not need (even if it's valid syntax) a trailing semicolon at the end. Also indent your code with four spaces. – linusg Sep 11 '16 at 09:44
  • To explain the comment of @metatoaster a bit: in Python 2, `range(x)` creates a list which is **completely** stored in your memory. That may cause trouble for large lists. In Python 3, `range(x)` is a generator function. That means, that you don't create a huge list at once but only the values actually needed (when iterating for example) are stored in memory. – linusg Sep 11 '16 at 09:48
  • Just to add to what @linusg wrote: Python 2 has `xrange` which does the same thing that `range` does in Python 3. – zvone Sep 11 '16 at 09:55
  • Of relevance http://stackoverflow.com/questions/34147515/is-operator-behaves-unexpectedly-with-non-cached-integers – Chris_Rands Sep 11 '16 at 10:23
  • Thanks all, that's very helpful for me. Thanks again. – qingdaojunzuo Sep 11 '16 at 12:14

1 Answers1

1

Regarding your first question, it basically related to the peephole optimizer which simplifies the expressions. i.e. using one integer object for all equal values. It also use this approach for interning the strings.

The reason that you don't see such behavior within the interactive shell is that every command executes separately and gives the corresponding result, whereas in a file or in a function (even in terminal) all the commands interpreted at once. Here is an example:

In [1]: def func():
   ...:     a = 9999999999999
   ...:     b = 9999999999999
   ...:     return a is b
   ...: 

In [2]: func()
Out[2]: True

In [3]: a = 9999999999999

In [4]: b = 9999999999999

In [5]: a is b
Out[5]: False

Regarding your second question, there are actually plenty of misunderstandings here. First off, the duty of VM in python is executing the machine code corresponding to each bytecode, while managing the integers and actually parsing the code and compiling to the bytecode is the interpreter and compiler's task. And as it's mentioned in comments, range() in python 2 returns a list while in python 3 it's a smart object that preserves the start, end and step, and is a generator like object which generated the items on demand.

Also about the functionality of gc.collect, as mentioned in documentation when you don't pass an argument to it, gc.collect run a full collection, and:

The free lists maintained for a number of built-in types are cleared whenever a full collection or collection of the highest generation (2) is run. Not all items in some free lists may be freed due to the particular implementation, in particular float

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • People usually don't call it a VM, but actually Python interpreter compiles py files into pyc bytecode (just like javac) which is then executed by something which quite resembles Java's VM. Even [python.org mentions it as a virtual machine](https://docs.python.org/2/glossary.html#term-virtual-machine) – zvone Sep 11 '16 at 10:03
  • @zvone Yes, but since its task is executing the machine code corresponding to each bytecode, is has nothing to do with managing the integers or other interpreter's (or compiler) tasks. But thanks for note, I'll explain that too. – Mazdak Sep 11 '16 at 10:08
  • Thanks a lot for the detail explanation. This is very helpful. Thanks again. – qingdaojunzuo Sep 11 '16 at 12:15
  • @qingdaojunzuo If it was helpful you can tell this to community by accepting the answer :). – Mazdak Sep 11 '16 at 12:26