Python is a "garbage collection
" language. One such consequence of this is that memory is automatically allocated and freed as needed. This creates memory fragmentation which can break apart transfers to the CPU caches. It's also not possible to change the layout of a data structure directly in memory which means that one transfer on the bus might not contain all the relevant information for a computation — even though it might all fit within the bus width. It essentially hurts any prospects for keeping L1/L2 caches filled with the relevant data for the next computation.
Another problem comes from Python’s dynamic types and not being compiled. Many C
developers generally realize at some point the compiler is usually smarter than they are. The compiler can perform many tricks to affect how things are laid out, how the CPU will run certain instructions, in what order, and the best way to optimize them. Python, however, is not compiled and, to make matters worse, has dynamic types which means that inferring any possible opportunities for optimizations with an algorithm is exponentially more difficult since code functionality can be changed during runtime.
As mentioned in the comments, there ways to mitigate such problems, foremost being CPython
or the Cython
extensions for Python — it allows Python code to be compiled.