Multiprocessing & Pool in main - how to get the output outside the main?

Question

Based on this answer (https://stackoverflow.com/a/20192251/9024698), I have to do this:

from multiprocessing import Pool

def process_image(name):
    sci=fits.open('{}.fits'.format(name))
    <process>

if __name__ == '__main__':
    pool = Pool()                         # Create a multiprocessing Pool
    pool.map(process_image, data_inputs)  # process data_inputs iterable with pool

to multi-process a for loop.

However, I am wondering, how can I get the output of this and further process if I want?

It must be like that:

if __name__ == '__main__':
    pool = Pool()                         # Create a multiprocessing Pool
    output = pool.map(process_image, data_inputs)  # process data_inputs iterable with pool
    # further processing

But then this means that I have to put all the rest of my code in __main__ unless I write everything in functions which are called by __main__?

The notion of __main__ has been always pretty confusing to me.

I would just "write everything in functions which are called by `__main__`". In languages like Java or C++ that *require* a `main` entry point, the entire program is `main` calling other functions. — Carcigenicate, Nov 04 '19 at 23:21
@Carcigenicate, ok I see. So literally everything should be encapsulated in functions simply because I want to do multi-processing. Does not sounds that reasonable to me from a higher-level viewpoint but it must make sense for python at the lower-level. — Outcast, Nov 04 '19 at 23:34
That's a proper way to have a program anyways. Ideally your program should already be broken down into functions, and you just need to call them from `main` when switching to using multiprocessing. Having everything as a top level script is messy and causes problems as the code grows. — Carcigenicate, Nov 04 '19 at 23:36
@Carcigenicate, sure I agree from the viewpoint of a finalised code - I am at the phase of prototyping though for now haha. But yes I see your point. — Outcast, Nov 04 '19 at 23:38
Ya, if you're developing in a REPL or something, I'll admit, it is a bit of a pain. — Carcigenicate, Nov 04 '19 at 23:41

score 2 · Answer 1 · answered Nov 04 '19 at 23:21

2

if __name__ == '__main__': is literally just "if this file is being run as a script, as opposed to being imported as a module, then do this". __name__ is a hidden variable that gets set to '__main__' if it's being run as a script. why it works this way is beyond the scope of this discussion but suffice it to say it has to do with how python evaluates sourcefiles top-to-bottom.

In other words, you can put the other two lines anywhere you want - in a function, probably, that you call elsewhere in the program. You could return output from that function, or do other processing on it, or etc., whatever you happen to need.

answered Nov 04 '19 at 23:21

Green Cloak Guy

23,793
4
33
53

Thanks, I see. So basically ALL my code should be put into functions which are called by `__main__`? As I said above, I find it a bit excessive simply because I want to do multi-processing but it must make sense for python somehow. – Outcast Nov 04 '19 at 23:36
@PoeteMaudit either all your code is reachable from `if __name__ == '__main__':` or reachable from the things that are reachable from that, or etc. etc. Your program has one entrypoint, and anything that isn't connected by at least *some* execution change just won't run. It's like `int main() {}` in C or Java, it's where the program starts, but it's not special beyond that. – Green Cloak Guy Nov 05 '19 at 02:12

Multiprocessing & Pool in __main__ - how to get the output outside the __main__?

1 Answers1

Multiprocessing & Pool in main - how to get the output outside the main?