Modules (and packages) are a great Pythonic way to divide your program into separate namespaces, which seems to be an implicit goal of this question. Indeed, as I was learning the basics of Python, I felt frustrated by the lack of a block scope feature. However once I understood Python modules, I could more elegantly realize my previous goals without the need for block scope.
As motivation, and to point people towards the right direction, I think it's useful to give provide explicit examples of some of Python's scoping constructs. First I explain my failed attempt at using Python classes to implement block scope. Next I explain how I achieved something more useful using Python modules. At the end I outline a practical application of packages to loading and filtering data.
Attempting block scope with classes
For a few moments I thought that I had achieved block scope by sticking code inside of a class declaration:
x = 5
class BlockScopeAttempt:
x = 10
print(x) # Output: 10
print(x) # Output: 5
Unfortunately this breaks down when a function is defined:
x = 5
class BlockScopeAttempt:
x = 10
print(x) # Output: 10
def printx2():
print(x)
printx2() # Output: 5!!!
That’s because functions defined within a class use global scope. The easiest (though not the only) way to fix this is to explicitly specify the class:
x = 5
class BlockScopeAttempt:
x = 10
print(x) # Output: 10
def printx2():
print(BlockScopeAttempt.x) # Added class name
printx2() # Output: 10
This is not so elegant because one must write functions differently depending on whether or not they’re contained in a class.
Better results with Python modules
Modules are very similar to static classes, but modules are much cleaner in my experience. To do the same with modules, I make a file called my_module.py
in the current working directory with the following contents:
x = 10
print(x) # (A)
def printx():
print(x) # (B)
def alter_x():
global x
x = 8
def do_nothing():
# Here x is local to the function.
x = 9
Then in my main file or interactive (e.g. Jupyter) session, I do
x = 5
from my_module import printx, do_nothing, alter_x # Output: 10 from (A)
printx() # Output: 10 from (B)
do_nothing()
printx() # Output: 10
alter_x()
printx() # Output: 8
print(x) # Output: 5
from my_module import x # Copies current value from module
print(x) # Output: 8
x = 7
printx() # Output: 8
import my_module
my_module.x = 6
printx() # Output: 6
As explanation, each Python file defines a module which has its own global namespace. The import my_module
command allows you to access the variables in this namespace with the .
syntax. I think of modules like static classes.
If you are working with modules in an interactive session, you can execute these two lines at the beginning
%load_ext autoreload
%autoreload 2
and modules will be automatically reloaded when their corresponding files are modified.
Packages for loading and filtering data
The idea of packages is a slight extension of the modules concept. A package is a directory containing a (possibly blank) __init__.py
file, which is executed upon import. Modules/packages within this directory can be accessed with the .
syntax.
For data analysis, I often need to read a large data file and then interactively apply various filters. Reading a file takes several minutes, so I only want to do it once. Based on what I learned in school about object-oriented programming, I used to believe that one should write the code for filtering and loading as methods in a class. A major disadvantage of this approach is that if I then redefine my filters, the definition of my class changes, so I have to reload the entire class, including the data.
Nowadays with Python, I define a package called my_data
which contains submodules named load
and filter
. Inside of filter.py
I can do a relative import:
from .load import raw_data
If I modify filter.py
, then autoreload
will detect the changes. It doesn't reload load.py
, so I don't need to reload my data. This way I can prototype my filtering code in a Jupyter notebook, wrap it as a function, and then cut-paste from my notebook directly into filter.py
. Figuring this out revolutionized my workflow, and converted me from a skeptic to a believer in the “Zen of Python.”