3

I was looking at the sizes of pickled objects and noticed that non-empty lists change size after unpickling. They grow larger by 24 bytes. The size of empty lists stays the same. If I use the getsizeof method here, then it shows that the same happens with nested lists, and the size grows by 24 bytes for each non-empty list.

How does this increase happen?

A small example:

import pickle
import sys

li = [1]    

with open('test.p', 'wb') as f:
    pickle.dump(li, f)
print (getsize(li), sys.getsizeof(li))

with open('test.p', 'rb') as f:
    li2 = pickle.load(f)
print (sys.getsizeof(li2))
BurnNote
  • 405
  • 1
  • 4
  • 13
  • If you `copy` li2 it will occupy again 80 bytes. But if you `deepcopy` it, it will occupy 104. – Maciek Sep 17 '20 at 12:16
  • 1
    Lists usually have some extra space allocated, so that the entire memory block holding its contents doesn't have to be reallocated every time you modify the list. The exact amount of this extra space will vary depending on the exact sequence of events that lead to the list's current contents. – jasonharper Sep 17 '20 at 13:07
  • @jasonharper To me, that seems like it would be a reason why newly reloaded lists would take less space, not more. You wouldn't save empty space. Also, wouldn't the same thing happen with a dict? Because they don't have an increase in size after unpickling. – BurnNote Sep 17 '20 at 13:53

1 Answers1

2

The rebuilding process for pickled lists is different from the "building from scratch" process for list literals. When you have a literal list, it sizes it for that initial size precisely. When it's rebuilding from a pickle, it creates an empty list, then appends items one by one as they're unpickled, with overallocation occurring every time capacity is exhausted.

You see the size discrepancy from comparing to a manually append built list (because for all practical purposes, that's what unpickling is doing too):

import pickle
import sys

literal = [1]
incremental = []
incremental.append(1)
pickled = pickle.loads(pickle.dumps(literal, -1))

print("Literal:", sys.getsizeof(literal))
print("Incremental:", sys.getsizeof(incremental))
print("Pickled:", sys.getsizeof(pickled))

Try it online!

which on TIO produces:

Literal: 80
Incremental: 104
Pickled: 104

The numbers vary by interpreter (my own Python build gets 64, 88, 88), but the pattern is the same; over-allocation (to achieve O(1) amortized append costs) affects incremental/pickle based list construction, but not list literals.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271