0

Recently I did a simple test in Python, wondering what is the memory impact of huge dictionaries (~ 10 million keys) and how to empty them efficiently (not all keys at once). I use the clear() method as reference, and I'm looking for the amount of memory still in use after all keys have been removed.

In my tests, the clear() method is very good at deleting and giving back the memory to the os; whereas when I use del or pop the memory consumed after all keys are removed it's still quite large. To measure the memory used by an object I use a function get_size found online, present at the beginning of my source code (also available here).

How can the clear method be so efficient compared to popor del ?

Code of my tests can be found here as well as the tests result.

smci
  • 32,567
  • 20
  • 113
  • 146
  • 1
    The `clear()` method knows that it's removing everything, so it can reclaim all the memory. Deleting individual elements in the middle of a dictionary doesn't allow the rest of the memory to be reclaimed unless it rehashes the dictionary. – Barmar Dec 16 '20 at 22:44
  • As addition, you can force a resize by replacing the dictionary with a copy. – scenox Dec 16 '20 at 23:03
  • 1
    @scenox yep, that's how I usually do it. In the rare cases where I've built a giant dictionary where I have to wittle it down, then I will do all the `del d[key]` for efficiencies sake but then one final `d = dict(d.items())` although `d.copy()` might be sufficient – juanpa.arrivillaga Dec 16 '20 at 23:54
  • @juanpa.arrivillaga thanks for the workaround :) – Guillaume Villena Dec 19 '20 at 17:07
  • You should cite and link to the original author of the recursive `get_size()`, whoever they are. Is it Wissam Jarjoui from 2016 [https://stackoverflow.com/a/38515297/202229]? – smci Jun 01 '23 at 23:53
  • When you say *"how to empty them efficiently (not all keys at once)"*, what does that mean (memory-efficient? shorter downtime for locking dictionary?) That you assume it's more efficient to empty them piecemeal? Or that you functionally require that you can't clear the entire dictionary at once? I don't think those assumptions are correct. – smci Jun 01 '23 at 23:57

1 Answers1

2

To avoid excessive hash table rebuilds, pop and del don't resize a dict's underlying hash table. Removing entries one by one will never shrink a dict's hash table.

Resizing only happens if a dict runs out of room on insertion (which can shrink the dict, due to how dummy entries work in the implementation), or if unrelated technical details force a rebuild (like having to un-split a split-table dict).

clear will throw away the old hash table entirely, though.

user2357112
  • 260,549
  • 28
  • 431
  • 505