Pythonian structure?

Question

I'm someone that is semi-well versed in MATLAB, but am trying to move those skills to Python in hopes of future job prospects. For example, when performing machine learning, I enjoy MATLAB because I can make my function clean looking as below for example:

 main.m
 ------------
 prescreen_fn(directory,threshold) %a prescreen function that is run
 plot_prescreen_hits(directory) %plot and print prescreen hits
 extract_features(directory,fft_size) %extract features from prescreen hit locations
 generate_train_test(directory) %parse training and testing data
 SVM_train_test(directory) %perform SVM training and testing
 -----------

Well, you get the point. It's nice to have a clean main function whereby it's easy to pass off variables defined by a user, etc.

The problem is I don't know the best way to do this in python. I've read all over stack exchange how it's bad to call python script from other scripts and even then passing of variables is difficult. Also, I'm not wanting to have some massive script where I define lots of python code above and then call them below in the same script.

Apologies if this is very vague, but the general structure of how python "should" look is confusing me

Thanks

I don't know what you've read, but definitely do divide your large piece of code into several files/modules and packages. — thebjorn, Aug 26 '14 at 20:52
I have always used modules and main() functions. Perhaps you've slightly misinterpreted what you've read. Could you maybe link some of these prohibitions against calling scripts? — Colin P. Hill, Aug 26 '14 at 20:54
I don't get the point. The code you have written might well be valid python code, just replace '%' with '#'... — Emanuele Paolini, Aug 26 '14 at 20:54
As an aside, the accepted adjective is "Pythonic" (which, for some, is also a compliment even higher than "elegant"). — Colin P. Hill, Aug 26 '14 at 21:00
@thebjorn I have read to divide into packages and modules. I suppose my issue is how to properly structure those. Furthermore to the other kind individuals that responded I was specifically referring to python scripts. So what I meant was that I've read how you shouldn't have one main python script where you just call other scripts, but that these "other scripts" should instead be modules. Is this not correct? I'm having problems understanding how to create these modules. — user2208604, Aug 26 '14 at 22:02
@user2208604 - It's all a matter of scale. Start simple. Just put whatever code you'd like in a file called `blah.py`, and you can then do `import blah; blah.foo()`. Start simple, and add complexity when you need it. — Joe Kington, Aug 26 '14 at 22:07

score 5 · Accepted Answer · edited May 23 '17 at 12:05

Your question may get closed as being off-topic or too broad, but I think it's a good question if rephrased as "what's the python equivalent of this code".

Generally speaking, this is something that a lot of folks coming from matlab get confused by. In python, things are separated into "namespaces" and you need to explicitly import functions/variables/etc from other files.

Common high-level structure of code

In matlab (if I remember correctly), you can't have functions in the same file with "bare" statements. In python you can. However, you can't call a function before it has been defined.

In other words, you can do:

def foo():
    print 'bar'

foo()

but not:

foo()

def foo():
   print 'bar'

Therefore, because you typically want the "outline-level" code at the top of the file, it's common to put it into a function and then call that function at the bottom after the other functions have been defined. Typically, you'd call this function main, but you're free to name it whatever you'd like.

As a quick example:

def main():
    directory = load_data()
    threshold, fft_size = 10, 1000

    prescreen_fn(directory,threshold)
    plot_prescreen_hits(directory) 
    extract_features(directory,fft_size) 
    generate_train_test(directory) 
    SVM_train_test(directory)

def prescreen_fn(directory, threshold):
    """A prescreen function that is run. Ideally this would be a
    more informative docstring."""
    pass

def plot_prescreen_hits(directory):
    pass

def extract_features(directory,fft_size):
    pass

def generate_train_test(directory):
    pass

def SVM_train_test(directory):
    pass

def load_data():
    pass

if __name__ == '__main__':
    main()

The last part probably looks a bit confusing. What that says is basically "execute the code in this block only if this file is run directly. If we're just importing functions from it, don't run anything yet." (There are a lot of explanations of this, e.g.What does if __name__ == "__main__": do? )

If you wanted, you could just do:

def main():
    ...

def other_things():
    ...

main()

If you just run the file, you'll get the same result. The difference is in what happens when we import this code from somewhere else. (In the first example, main wouldn't be called while in the second it would.)

Calling functions in other files

As things grow, you might decide to split some of that into separate files. For example, we might put some of the functions in a file called data.py and others in a file called model.py. We can then import functions from these files into another file where the "pipeline" is built up (we might even call this one main.py, or maybe something more descriptive).

Unlike matlab, we need to explicitly import these files. I won't go into the details here, but import basically tries to find a file or package (directory with a specific structure) with the specified name first in "library" locations and then in the same directory as the file being run (the preference order changed in 2.7 - local files used to supersede library files).

In the example below, import data will import functions and variables in the file "data.py" (and the same for import model). The functions, etc in that file are in a "namespace" called data, so we'll need to refer to them that way. (Note that you can do from data import * to bring them into the global namespace, but you really, really should avoid that unless you're in an interactive shell.)

import data
import model

directory = data.load_data()
threshold, fft_size = 10, 1000

data.prescreen_fn(directory, threshold)
data.plot_prescreen_hits(directory)
data.extract_features(directory, fft_size)
model.generate_train_test(directory)
model.SVM_train_test(directory)

Notice that I didn't bother wrapping this one into a main function. We certainly could have. The reason I didn't do that here is that you presumably wouldn't ever want to import something from this short "main.py" file. Therefore we don't need to run things behind an if __name__ == '__main__': conditional.

Hopefully these examples help clarify things a bit.

Thanks for the detailed response. This is definitely what I was going for, but my wording was a bit off. Specifically the "calling functions in other files" part. Again, thanks a ton. I marked it as the correct answer so I hope I did that correctly. — user2208604, Aug 26 '14 at 22:20

Pythonian structure?

1 Answers1

Common high-level structure of code

Calling functions in other files