3

As a programmer, I generally try to avoid the del statement because it is often an extra complication that Python program don't often need. However, when browsing the standard library (threading, os, etc...) and the pseudo-standard library (numpy, scipy, etc...) I see it used a non-zero amount of times, and I'd like to better understand when it is/isn't appropriate the del statement.

Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program. It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through. However, I can also see a world where the extra instruction takes up more time than it saves.

My question is: does anyone have any interesting code snippets that demonstrate cases where del significantly changes the speed of the program? I'm most interested in cases where del improves the execution speed of a program, although non-trivial cases where del can really hurt are also interesting.

Erotemic
  • 4,806
  • 4
  • 39
  • 80
  • 2
    "It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through" - that's not how variable lookup works in Python, and it's almost certainly not what was going through the author's head in the code you've seen that uses `del`. – user2357112 Nov 27 '18 at 00:42
  • Looking at one of the example you give (`threading`), that module uses `del` to avoid global namespace clutter, to remove dict entries, and to avoid traceback reference cycles. – user2357112 Nov 27 '18 at 00:45

2 Answers2

3

The main reason that standard Python libraries use del is not for speed but for namespace decluttering ("avoiding namespace pollution" is another term I believe I have seen for this). As user2357112 noted in a comment, it can also be used to break a traceback cycle.

Let's take a concrete example: line 58 of types.py in the cpython implementation reads:

del sys, _f, _g, _C, _c, # Not for export

If we look above, we find:

def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None)         # Same as FunctionType
CodeType = type(_f.__code__)
MappingProxyType = type(type.__dict__)
SimpleNamespace = type(sys.implementation)

def _g():
    yield 1
GeneratorType = type(_g())

_f and _g are two of the names being deled; as the comment says, they are "not for export".1

You might think this is covered via:

__all__ = [n for n in globals() if n[:1] != '_']

(which is near the end of that same file), but as What's the python __all__ module level variable for? (and the linked Can someone explain __all__ in Python?) note, these affect the names exported via from types import *, rather than what's visible via import types; dir(types).

It's not necessary to clean up your module namespace, but doing so prevents people from sneaking into it and using undefined items. So it's good for a couple of purposes.


1Looks like someone forgot to update this to include _ag. _GeneratorWrapper is harder to hide, unfortunately.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Doe namespace decluttering have any impact on program speed? My thinking is that a large namespace (aka hash table) may have a speed penalty (due to hash collisions, loading pages into ram, cache efficiency, etc..), or are CPython's hash tables so fast it doesn't matter? – Erotemic Nov 27 '18 at 01:04
  • It doesn't, really. CPython uses its own dictionary implementation for most of these (there are some special cases with slots, I think, but for those `del` is entirely irrelevant) and its lookup efficiency is quite good and largely independent of dict-size. See https://stackoverflow.com/questions/513882/python-list-vs-dict-for-look-up-table for some performance stats. – torek Nov 27 '18 at 01:13
  • Related: https://stackoverflow.com/questions/31442906/how-to-force-python-dictionary-to-shrink (however, dict implementation has changed at least twice during Python's lifetime, so not sure if the answers here can be counted-on). You'd need to have the larger and smaller dictionaries be right across a shrink/expand point for this to have any effect, though. – torek Nov 27 '18 at 01:17
0

Specifically, I'm curious about the relationship between the Python del statement and the efficiency of a Python program.

As far as performance is concerned, del (excluding index deletion like del x[i]) is primarily useful for GC purposes. If you have a variable pointing to some large object that is no longer needed, deling that variable will (assuming there are no other references to it) deallocate that object (with CPython this happens immediately, as it uses reference counting). This could make the program faster if you'd otherwise be filling your RAM/caches; only way to know is to actually benchmark it.

It seems to me that del might help a program run faster by reducing the amount of clutter lookup instructions need to sift through.

Unless you're using thousands of variables (which you shouldn't be), it's exceedingly unlikely that removing variables using del will make any noticeable difference in performance.

arshajii
  • 127,459
  • 24
  • 238
  • 287
  • Python will remove those references anyway when they go out of scope, barring cases like a closure needing them, and it'll be faster than if you used `del` in most cases. – user2357112 Nov 27 '18 at 00:53
  • @user2357112 Sure, but there are definitely cases where you don't want to wait for a variable to go out of scope before deallocating the object it points to. – arshajii Nov 27 '18 at 00:56
  • @arshajii that question is the one I'm really trying to get at. Specifically in the case of large, difficult-to-refactor, monolithic functions. Would `del` possibly be able to give us a meaningful speed boost in a case like that. I guess another way to phrase it is: how much does namespace de-cluttering affect program speed? – Erotemic Nov 27 '18 at 00:59
  • @Erotemic Any performance boost wouldn't come from "namespace decluttering", but from using less memory (which could be noticeable if you're filling your RAM/caches). – arshajii Nov 27 '18 at 01:01
  • So perhaps the question could be answered as a function of system's hardware. At some point if the namespace is too big, is it correct that the hash table will no longer fit in cache and thus result in slower execution time? In this case, I'd be interested in seeing benchmarks that measure the slowdown vs dict size (wrt CPU cache size). – Erotemic Nov 27 '18 at 01:09
  • @Erotemic I meant filling up cache with other data, then using `del` on that (e.g. `v = [0]*1000; del v`. This is completely independent of variable names and such. – arshajii Nov 27 '18 at 01:37