Garbage collection in Python for Linux

Question

I'm a little puzzled how Python allocates memory and garbage-collects, and how that is platform-specific. For example, When we compare the following two code snippets:

Snippet A:

>>> id('x' * 10000000) == id('x' * 10000000)
True

Snippet B:

>>> x = "x"*10000000
>>> y = "x"*10000000
>>> id(x) == id(y)
False

Snippet A returns true because when Python allocates memory, it allocates it in the same location for the first test, and in different locations in the second test, which is why their memory locations are different.

But apparently system performance or platform impacts this, because when I try this on a larger scale:

for i in xrange(1, 1000000000):
    if id('x' * i) != id('x' * i):
        print i
        break

A friend on a Mac tried this, and it ran until the end. When I ran it on a bunch of Linux VMs, it would invariably return (but at different times) on different VMs. Is this because of the scheduling of the garbage collection in Python? Was it because my Linux VMs had less processing speed than the Mac, or does the Linux Python implementation garbage-collect differently?

Here is another helpful discussion: [why-is-keyword-has-different-behavior-when-there-is-dot-in-the-string](http://stackoverflow.com/questions/2858603/why-is-keyword-has-different-behavior-when-there-is-dot-in-the-string) — pbanka, Nov 01 '12 at 16:22

score 6 · Answer 1 · answered Oct 30 '12 at 19:45

The garbage collector just uses whatever space is convenient. There are lots of different garbage collection strategies, and things are also affected by paramters, different platforms, memory usage, phase of the moon etc. Trying to guess how the interpreter will happen to allocate particular objects is just a waste of time.

score 5 · Answer 2 · answered Oct 30 '12 at 19:50

5

It happens because python caches small integers and strings :

large strings : stored in variables not cached:

In [32]: x = "x"*10000000

In [33]: y = "x"*10000000

In [34]: x is y
Out[34]: False

large strings : not stored in variables, looks like cached:

In [35]: id('x' * 10000000) == id('x' * 10000000)
Out[35]: True

small strings : cached

In [36]: x="abcd"

In [37]: y="abcd"

In [38]: x is y
Out[38]: True

small integers: Cached

In [39]: x=3

In [40]: y=3

In [41]: x is y
Out[41]: True

large integers:

stored in variables: not cached

In [49]: x=12345678

In [50]: y=12345678

In [51]: x is y
Out[51]: False

not stored: cached

In [52]: id(12345678)==id(12345678)
Out[52]: True

answered Oct 30 '12 at 19:50

Ashwini Chaudhary

244,495
58
464
504

That doesn't explain the behavior the question is asking about. – Antimony Oct 30 '12 at 23:18
@Antimony the behavior he's observing is result of caching done by python. So I tried to explain that a bit. – Ashwini Chaudhary Oct 30 '12 at 23:24
no it's not. Interning explains why it just doesn't return 1 every time. But the question is why it returns varying values on different platforms and sometimes doesn't return at all. – Antimony Oct 30 '12 at 23:58

ebo · Accepted Answer · 2012-10-30T20:30:27.160

3

CPython uses two strategies for memory management:

Reference Counting
Mark-and-Sweep Garbage Collection

Allocation is in general done via the platforms malloc/free functions and inherits the performance characteristics of the underlaying runtime. If memory is reused is decided by the operating system. (There are some objects, which are pooled by the python vm)

Your example does, however, not trigger the 'real' GC algorithm (this is only used to collect cycles). Your long string gets deallocated as soon as the last reference is dropped.

edited Oct 30 '12 at 20:30

answered Oct 30 '12 at 19:50

ebo

8,985
3
31
37

2

*CPython* does that. While knowing that is useful for hacks, for explaining the results of individual experiments, etc. it's ultimately as much of an implementation detail as, say, the caching of integers. – Oct 30 '12 at 20:13
True, fixed. I still think to original question was targeted at CPythons behaviour. – ebo Oct 30 '12 at 20:31

Garbage collection in Python for Linux

3 Answers3