Short Answer
tracemalloc was started too late to track the inital block of memory, so it
didn't realize it was a reuse. In the example you gave, you free 27999860 bytes
and allocate 27999860 bytes, but tracemalloc can't 'see' the free. Consider the
following, slightly modified example:
import tracemalloc
tracemalloc.start()
xs = list(range(10**6))
print(tracemalloc.get_traced_memory())
for i, x in enumerate(xs):
xs[i] = -x
print(tracemalloc.get_traced_memory())
On my machine (python 3.10, but same allocator), this displays:
(35993436, 35993436)
(36000576, 36000716)
After we allocate xs, the system has allocated 35993436 bytes, and after we run
the loop we have a net total of 36000576. This shows that the memory usage isn't
actually increasing by 28 Mb.
Why does it behave this way?
Tracemalloc works by overriding the standard internal methods for allocating
with tracemalloc_alloc
, and the similar free and realloc methods. Taking a
peek at the source:
static void*
tracemalloc_alloc(int use_calloc, void *ctx, size_t nelem, size_t elsize)
{
PyMemAllocatorEx *alloc = (PyMemAllocatorEx *)ctx;
void *ptr;
assert(elsize == 0 || nelem <= SIZE_MAX / elsize);
if (use_calloc)
ptr = alloc->calloc(alloc->ctx, nelem, elsize);
else
ptr = alloc->malloc(alloc->ctx, nelem * elsize);
if (ptr == NULL)
return NULL;
TABLES_LOCK();
if (ADD_TRACE(ptr, nelem * elsize) < 0) {
/* Failed to allocate a trace for the new memory block */
TABLES_UNLOCK();
alloc->free(alloc->ctx, ptr);
return NULL;
}
TABLES_UNLOCK();
return ptr;
}
We see that the new allocator does two things:
1.) Call out to the "old" allocator to get memory
2.) Add a trace to a special table, so we can track this memory
If we look at the associated free functions, it's very similar:
1.) free the memory
2.) Remove the trace from the table
In your example, you allocated xs before you called tracemalloc.start()
, so
the trace records for this allocation are never put in the memory tracking
table. Therefore, when you call free on the initial array data, the traces aren't removed, and thus your weird allocation behavior.
Why is the total memory usage 36000000 bytes and not 28000000
Lists in python are weird. They're actually a list of pointer to individually
allocated objects. Internally, they look like this:
typedef struct {
PyObject_HEAD
Py_ssize_t ob_size;
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
* 0 <= ob_size <= allocated
* len(list) == ob_size
* ob_item == NULL implies ob_size == allocated == 0
*/
Py_ssize_t allocated;
} PyListObject;
PyObject_HEAD is a macro that expands to some header information all python
variables have. It is just 16 bytes, and contains pointers to type data.
Importantly, a list of integers is actually a list of pointer to PyObjects
that happen to be ints. On the line xs = list(range(10**6))
, we expect to
allocate:
- 1 PyListObject with internal size 1000000 -- true size:
sizeof(PyObject_HEAD) + sizeof(PyObject *) * 1000000 + sizeof(Py_ssize_t)
( 16 bytes ) + ( 8 bytes ) * 1000000 + ( 8 bytes )
8000024 bytes
- 1000000 PyObject ints (A
PyLongObject
in the underlying implmentation)
1000000 * sizeof(PyLongObject)
1000000 * ( 28 bytes )
28000000 bytes
For a grand total of 36000024 bytes. That number looks pretty farmiliar!
When you overwrite a value in the array, your just freeing the old value, and updating the pointer in PyListObject->ob_item. This means the array structure is allocated once, takes up 8000024 bytes, and lives to the end of the program. Additionally, 1000000 Integer objects are each allocated, and references are put in the array. They take up the 28000000 bytes. One by one, they are deallocated, and then the memory is used to reallocate a new object in the loop. This is why multiple loops don't increase the amount of memory.