I am struggling with properly organising Python code. My struggle stems from namespaces in Python. More specifically, as I understand, every module has its own specific namespace and if not stated specifically, elements from one namespace will not be available in another namespace.
To give an example what I mean, consider the function subfile1.py:
def foo();
print(a)
Now if I want the foo()
function to print a
from the namespace of another module/script, I will have to set it explicitly for that subfile1 module; e.g. like in main.py:
import subfile1
a = 4
subfile1.a = 4
subfile1.foo()
When I code (in R), it’s generally to provide data analytics reports. So I have to import data, clean data and perform analyses on this data. My coding projects (to be re-run every couple of months) are then organised as follows: I have a main script from which I run other sub-scripts. I find this very handy I have separate sub-scripts for:
- Setting paths
- Loading data
- Data cleaning
- Specific analyses
Simply looking at the names of the sub-scripts listed in the main script gives me then a good overview of what is being done and where I need to add additional sub-scripts if needed.
With Python I don’t find this an easy way of working because of the namespaces. If I would like to keep this way of working in Python, it seems to me that I would have to import all sub-scripts as modules and set the necessary variables for each module explicitly (either in the module or in the main script as in the example above).
So I thought that maybe my way of working is simply not optimal, at least in a Python setting. Could anyone tell me how best to organise the project I have in mind? Or put differently, what is a good way of organising code if the same data/variables has to be used at various places in the code?
Update:
After reading the comments, I made another example. Now with classes. Any comments on whether this is an appropriate way of structuring code or not would be very welcome. My main aim is to split the code in bits and pieces so a third party can quickly gauge what is going on and make changes to the code where necessary.
I have a master_file.py
in which I run the code:
from sub_file_1 import DoSomething
from sub_file_2 import DoSomethingElse
from sub_file_3 import CombineTheStuff
x1 = DoSomething.c #could e.g. be import from a data source and transform the data
x2 = DoSomethingElse.d #could e.g. be import from another data source and transform the data taking into its particularities
y = CombineTheStuff(x1, x2) # e.g. with the data imported and cleaned (x1 and x2) I can now estimate a particular model
print(x1)
print(x2)
print(y.fin_result())
sub_file_1.py
contains (in reality these sub_files are very lengthy)
from param_file import GeneralPar
class DoSomething():
c = GeneralPar.a + GeneralPar.b
sub_file_2.py
contains
from param_file import GeneralPar
class DoSomethingElse():
d = GeneralPar.a * 4
sub_file_3.py
contains
class CombineTheStuff():
def __init__(self, input1, input2):
self.input1 = input1
self.input2 = input2
def fin_result(self):
k = self.input1 * self.input2 - 0.01
return(k)