1

I have the following example code:

def my_function_caller():
    samples = []
    for t in range(2):
        samples.append(my_function(t))
    return samples

def my_function(t):
    results = []
    if __name__ == '__main__':
        pool = Pool()
        results = pool.map(task, range(5))
        pool.close()
        pool.join()
    A = results[0]
    return A


def task(k):
    time.sleep(1)
    result = k
    return result

When I call my_function(t), I get the following error:

    A = results[0]
IndexError: list index out of range

I expected pool.close() and pool.join() to make the program wait for all processes to finish so that I could then use the jointly computed result "results" afterwards. How can I force the program to wait or more generally, how can I directly use "results" in the function "my_function"?

EDIT: To recreate the error: This is the complete code that I am running (simply copied and pasted). The python file called main.py is located in a standard Python project and I am using Windows.

from multiprocessing import Pool
import time

def my_function_caller():
    samples = []
    for t in range(2):
        samples.append(my_function(t))
    return samples

def my_function(t):
    results = []
    if __name__ == '__main__':
        pool = Pool()
        results = pool.map(task, range(5))
        pool.close()
        pool.join()
    A = results[0]
    return A


def task(k):
    time.sleep(1)
    result = k
    return result

a = my_function_caller()

Maybe, as additional information, I get the error message

        A = results[0]
IndexError: list index out of range

several times, not just once.

Felix P.
  • 73
  • 6
  • Please update your question with the code that calls `my_function()`. Also, try your code again, but omit the first line of your function, ie omit `results = []`. – quamrana Aug 23 '21 at 09:49
  • 2
    This is *not* a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). I should be able to copy and paste what you post and reproduce the problem *without adding anything*. – Booboo Aug 23 '21 at 10:02
  • @quamrana When I removed results = [], I got the error message: UnboundLocalError: local variable 'results' referenced before assignment. I added the function that is calling my_function(t). – Felix P. Aug 23 '21 at 10:08
  • Tried to reproduce this code. `a = my_function_caller(); print(a)` returning `[0, 0]` without any errors. `a = my_function(1); print(a)` returning `0` – rzlvmp Aug 23 '21 at 10:33
  • Your error message, in the comment above, suggests that you don't call `my_function` in `__main__`. Add a `print(__name__)` at the beginning of the function to check what is happening. (Code works fine here.) – Timus Aug 23 '21 at 10:40
  • @Timus I added print(__ name__) as you suggested. It returned __ main__ at first, then __ mp_main__ after that right before I get the error messages. My python file is not called main, however, this can't be the problem can it? – Felix P. Aug 23 '21 at 10:43
  • I cannot reproduce your error. Please update your question with the code that calls `my_function_caller()` – quamrana Aug 23 '21 at 10:49
  • Ok, that’s not going to work, because ‘a = my_function_caller()’ will be called by accident by multiprocessing. That will explain your error. – quamrana Aug 23 '21 at 10:55
  • I just asked someone else to run it on his Linux computer and it worked there without problem. It seems to be a Windows issue. @Timus are there any downsides to that quick fix? Otherwise, I'll just use that as it actually does not give me an error. – Felix P. Aug 23 '21 at 10:58
  • The question would be how you call the script. – swaggg Aug 23 '21 at 10:59
  • And by the way, is your intention for the script to terminate with an error if it's imported? – swaggg Aug 23 '21 at 11:03
  • It's just that: A _quick_ fix. Off the top of my head I don't see downsides. But I wouldn't use it and look for a more systemic solution. Do you need to pack the mp-part into a function? I would rather use a `if __name__ == '__main__':` at zero indentation level in the script and do all `my_function`-stuff afterwards, directly. – Timus Aug 23 '21 at 11:03
  • I am afraid that I need to use the if __name... part in a function. Later on I would like to call that function from a different python file in the project. Eventually, I will run the program on a Linux virtual machine and it is pretty weird that I don't get an error when running the script on Linux. – Felix P. Aug 23 '21 at 11:06
  • Ohhh ok, now I get it, this is Windows-specific behaviour. – swaggg Aug 23 '21 at 11:16
  • 2
    Very weird indeed, thank God I don't use Windows. See this answer: https://stackoverflow.com/a/53924048/10499398 – swaggg Aug 23 '21 at 11:17
  • 2
    The solution therefore is to put the if statement at the top of your script, otherwise Windows will still execute the rest of the program and then cause an unhandled error. – swaggg Aug 23 '21 at 11:20
  • https://bugs.python.org/issue43306 – Aaron Aug 23 '21 at 17:21

2 Answers2

2

It worked for me on Linux. However, I consider the structure little bit messy, consider e.g. this to more easily debug your problem:

from multiprocessing import Pool
import time


def my_function_caller():
    samples = []
    for t in range(2):
        samples.append(my_function(t))
    return samples


def my_function(t):
    with Pool(5) as p:
        results = p.map(task, range(5))
    A = results[0]
    return A


def task(k):
    time.sleep(1)
    result = k
    return result


if __name__ == "__main__":
    a = my_function_caller()
    print(a)
Roman Pavelka
  • 3,736
  • 2
  • 11
  • 28
2

It's not really my answer but I'm going to post it as an answer anyway. Windows displays some really messed up behaviour as mentioned here:

python multiprocessing on Windows

The process is supposed to only call your function, but it ends up executing the whole program all over again.

You have to prepend the entry point with if __name__ == "__main__":

if __name__ == "__main__":
   a = my_function_caller()

Separately, you should still use if __name__ == "__main__" or __name__ == "__mp_main__": in your threaded function, but either at the top or at least making sure the program won't try to access a non-existent value if being imported.

swaggg
  • 460
  • 4
  • 13