6

I create a list of a million int objects, then replace each with its negated value. tracemalloc reports 28 MB extra memory (28 bytes per new int object). Why? Does Python not reuse the memory of the garbage-collected int objects for the new ones? Or am I misinterpreting the tracemalloc results? Why does it say those numbers, what do they really mean here?

import tracemalloc

xs = list(range(10**6))
tracemalloc.start()
for i, x in enumerate(xs):
    xs[i] = -x
print(tracemalloc.get_traced_memory())

Output (Try it online!):

(27999860, 27999972)

If I replace xs[i] = -x with x = -x (so the new object rather than the original object gets garbage-collected), the output is a mere (56, 196) (try it). How does it make any difference which of the two objects I keep/lose?

And if I do the loop twice, it still only reports (27992860, 27999972) (try it). Why not 56 MB? How is the second run any different for this than the first?

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • After reading the documentation, starting tracing from the previous line, reviewing outputs for multiple lines and comparing snapshots, I'm not sure what the outputs mean anymore. – Michael Szczesny Mar 12 '22 at 19:07
  • @MichaelSzczesny Yes, I also tried several other things, like running the loop twice (no change, still reports 28 MB), storing `x + 10**6` instead of `-x` (still no change), or just computing `-x` but not storing it (that reports `(56, 196)`). I don't get what the outputs mean, either, and would like to. – Kelly Bundy Mar 12 '22 at 19:15
  • I don't know exactly what the numbers mean but my guess is that what you are seeing is not new allocation but just tracemalloc becoming aware of the existing allocation. Putting the tracemalloc.start() at the top and adding another tracemalloc.get_traced_memory() before the loop that negates xs shows no major inclrease in memory used. – William Mar 12 '22 at 20:37
  • @William Hmm, but then why didn't it become aware of the existing allocation when I only computed `-x` without storing it? (See my previous comment). – Kelly Bundy Mar 12 '22 at 21:23
  • 1
    Python shouldn't reuse memory of old int objects, the question is why garbage collector keep old numbers in memory even after loop finished. Weird. – Olvin Roght Mar 14 '22 at 16:51
  • @OlvinRoght What do you mean with "shouldn't reuse memory of old int objects"? That sounds wrong. – Kelly Bundy Mar 14 '22 at 16:54
  • @KellyBundy I've meant that `-x` is another int, it replaces old int in list after that `x` should be cleaned out of memory but it doesn't. – Olvin Roght Mar 14 '22 at 16:55
  • @OlvinRoght Ah, so we mean "reuse the memory" differently. When I said that, I meant it as in an old object got garbage collected, so its memory is available again, and a new object can then reuse that memory. I don't mean it as in directly overwriting the object data without going through garbage collection and creating a new object (which I now think is how you meant it). – Kelly Bundy Mar 14 '22 at 17:07
  • I think CPython might be optimizing away `-x` in the `x = -x` example, since the new value of `x` doesn't get used. If I add a line like `print(x, file=devnull)` (with `devnull = open(os.devnull, "w")` having been defined earlier), `tracemalloc` outputs `(1648, 117089)`. Not as much as reassigning to the list itself, but significantly more than when `x` wasn't subsequently used. – chepner Mar 14 '22 at 18:05
  • @chepner I don't think it optimizes that away. That's not the kind of thing it does. And at least [dis](https://tio.run/##TU7bCsIwDH3PV@Rt6dChCCLCvmWULc7CeiGN0H19LeLDAudAwrkk7fqO4fZIUqvzKYri4jJAo6GBuq6D/13FzuzttsUZoGQccXNZSWxYma6Xvr8bAwfRkNWKkoFXFHQnLOgCcvh4FqtMJZsnYJvSks4FkrigdPSvrNNvXybPPspOraA9ZGr9Ag) still shows the operation in there. I think that extra code itself is responsible for those larger numbers. – Kelly Bundy Mar 14 '22 at 18:10
  • @chepner Also, if I do that but print with `flush=True`, it's only [`(56, 480)`](https://tio.run/##TY3dCoMwDEbv@xRhV6042RiMMfAtdi9Foxb6Rxq3@vROnAxzlZwvnC/OPAZ/e0RaFuNiIAYm3aLT1oZW7CgkIXKCGqxJLEn7AeX1UhR3pUSHbz9Zu4YhopchVTsp4fQ5KXHQVYk1sVSiDwSmhAzGA/rJIWlGmZN6Clgnr7Jz3tZIxrPMJfTGYv0393ZKY/2iCZX4vRxrBuRmu7vGoQs0S6WW5Qs). – Kelly Bundy Mar 14 '22 at 18:18
  • True, `STORE_NAME` is still generated, but I'm not sure the byte code executer makes use of it. *Something* changes when you actually make use of the value assigned to the name. – chepner Mar 14 '22 at 18:18
  • 1
    That makes some sense (I think). If you don't flush, then the writes might build up in memory; if you do flush, ... I don't know. This is deeper into the bowels of CPython than I care to think about :) – chepner Mar 14 '22 at 18:20
  • @chepner Another lightweight usage: `total += x` and printing the total at the end. Reports [`(92, 240)`](https://tio.run/##TU3LCsIwELzvV@wxqQ8qgojQbymhrjWQF5stpF8fY/XQOc0wr7TKO4brPXGt1qfIgsJmIm@cixNAyTigs1kUmzCTuvRdd9MaJIpxzephFz9nMSxKwysy2iMWtAEpLJ7YCKmS9QOwobTiqWz0t3MYsEBiG0Tt52aScdPP0ZOPvKr2/I99e7rWDw). – Kelly Bundy Mar 14 '22 at 18:22
  • @chepner Or simply [print `x` at the end](https://tio.run/##TY3NCsJADITveYocs0VFEUQEn6UsNdaF7g/ZFNKnXxe9dG4D3zdTNv3kdL0XaS3EkkVRxU8c/bLkCcAqPnEJVUl8mpku52G4OQc76FTVi5KDdxYMBzQMCTmtkcUrk1X3AOyxvnQ0KBKS0t6fWcdff42RY5aN@sEfM9faFw). Then it's `(56, 196)`. And to optimize any part of the `x = -x` away, Python would then need to be pretty smart. It can't optimize it away for the last number, so it would need to figure out whether it's the last number. This kind of optimization is really not something CPython does, I think. Might not even be an optimization, as that effort would cost time, too. – Kelly Bundy Mar 14 '22 at 18:36

1 Answers1

8

Short Answer

tracemalloc was started too late to track the inital block of memory, so it didn't realize it was a reuse. In the example you gave, you free 27999860 bytes and allocate 27999860 bytes, but tracemalloc can't 'see' the free. Consider the following, slightly modified example:

import tracemalloc

tracemalloc.start()

xs = list(range(10**6))
print(tracemalloc.get_traced_memory())
for i, x in enumerate(xs):
    xs[i] = -x
print(tracemalloc.get_traced_memory())

On my machine (python 3.10, but same allocator), this displays:

(35993436, 35993436)
(36000576, 36000716)

After we allocate xs, the system has allocated 35993436 bytes, and after we run the loop we have a net total of 36000576. This shows that the memory usage isn't actually increasing by 28 Mb.

Why does it behave this way?

Tracemalloc works by overriding the standard internal methods for allocating with tracemalloc_alloc, and the similar free and realloc methods. Taking a peek at the source:

static void*
tracemalloc_alloc(int use_calloc, void *ctx, size_t nelem, size_t elsize)
{
    PyMemAllocatorEx *alloc = (PyMemAllocatorEx *)ctx;
    void *ptr;

    assert(elsize == 0 || nelem <= SIZE_MAX / elsize);

    if (use_calloc)
        ptr = alloc->calloc(alloc->ctx, nelem, elsize);
    else
        ptr = alloc->malloc(alloc->ctx, nelem * elsize);
    if (ptr == NULL)
        return NULL;

    TABLES_LOCK();
    if (ADD_TRACE(ptr, nelem * elsize) < 0) {
        /* Failed to allocate a trace for the new memory block */
        TABLES_UNLOCK();
        alloc->free(alloc->ctx, ptr);
        return NULL;
    }
    TABLES_UNLOCK();
    return ptr;
}

We see that the new allocator does two things:

1.) Call out to the "old" allocator to get memory

2.) Add a trace to a special table, so we can track this memory

If we look at the associated free functions, it's very similar:

1.) free the memory

2.) Remove the trace from the table

In your example, you allocated xs before you called tracemalloc.start(), so the trace records for this allocation are never put in the memory tracking table. Therefore, when you call free on the initial array data, the traces aren't removed, and thus your weird allocation behavior.

Why is the total memory usage 36000000 bytes and not 28000000

Lists in python are weird. They're actually a list of pointer to individually allocated objects. Internally, they look like this:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD is a macro that expands to some header information all python variables have. It is just 16 bytes, and contains pointers to type data.

Importantly, a list of integers is actually a list of pointer to PyObjects that happen to be ints. On the line xs = list(range(10**6)), we expect to allocate:

  • 1 PyListObject with internal size 1000000 -- true size:
sizeof(PyObject_HEAD) + sizeof(PyObject *) * 1000000 + sizeof(Py_ssize_t)
(     16 bytes      ) + (    8 bytes     ) * 1000000 + (     8 bytes    )
8000024 bytes
  • 1000000 PyObject ints (A PyLongObject in the underlying implmentation)
1000000 * sizeof(PyLongObject)
1000000 * (     28 bytes     )
28000000 bytes

For a grand total of 36000024 bytes. That number looks pretty farmiliar!

When you overwrite a value in the array, your just freeing the old value, and updating the pointer in PyListObject->ob_item. This means the array structure is allocated once, takes up 8000024 bytes, and lives to the end of the program. Additionally, 1000000 Integer objects are each allocated, and references are put in the array. They take up the 28000000 bytes. One by one, they are deallocated, and then the memory is used to reallocate a new object in the loop. This is why multiple loops don't increase the amount of memory.

Carson
  • 2,700
  • 11
  • 24