4

I use Rust to speed up a data processing pipeline, but I have to run some existing Python code as-is, which I want to parallelize. Following discussion in another question, creating multiple Python processes is a possible approach given my project's specific constraints. However, running the code below gives an infinite loop. I can't quite understand why.

use cpython::Python;

fn main() {
    let gil = Python::acquire_gil();
    let py = gil.python();
    py.run(r#"
import sys
from multiprocessing import Process

def f(name):
    print('hello', name)

if __name__ == '__main__':
    print('start')
    sys.argv=['']
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()
    "#, None,None).unwrap();
}

Output (continues until Ctrl-C):

start
start
start
start
start
start
start
start

EDIT

As mentioned in the comments below, I gave up on trying to create processes from the Python code. The interference between Windows, the Python multiprocessing module, and how processes are created with Rust are too obscure to manage properly. So instead I will create and manage them from Rust. The code is therefore more textbook:

use std::process::Command;
fn main() {
    let mut cmd = Command::new("python");
    cmd.args(&["-c", "print('test')"]); 
    let process = cmd.spawn().expect("Couldn't spawn process.");
    println!("{:?}", process.wait_with_output().unwrap());
}
Amaury
  • 63
  • 5
  • 1
    I can't reproduce this. Using latest cpython crate (v0.4.1) and latest Python (3.8.1). – Thomas Feb 25 '20 at 10:17
  • 1
    I suppose `__name__` is always set to `"__main__"` in your case, even in your subprocesses. You could work around this by putting that code in a `main()` function in Python and calling that function from Rust as per [this example](https://github.com/dgrunwald/rust-cpython/issues/121#issuecomment-389946108). – Thomas Feb 25 '20 at 10:20
  • You are right. I added `print(__name__)` in the python code, and each process prints `__main__`. If I run the code in pure Python, the `__name__` in child processes is `__mp_main__` which prevents loops. I have checked the link you sent, but couldn't understand how to apply this to my example. – Amaury Feb 25 '20 at 13:31
  • it wouldn't hurt to just yank out all the `__main__` stuff and do a `print(hello)`. you don't really need that `if main` either - it is there to guard against code execution on an import, which isnt necessary for you. once print works, try adding subprocess. point is, simplify the python end as much as possible and then build it back up. if you have issues with hello, you know you need to fix rust call config. if not, gradually add minimal code to build to your desired solution on python, – JL Peyret Feb 25 '20 at 17:34
  • You are on windows, aren’t you? – ead Feb 25 '20 at 18:42
  • You have created a frozen module. There is an explanation what is going wrong (even if cython+c is used to create the frozen module and not rust): https://stackoverflow.com/a/47360452/5769463, here is a solution which works on Windows only: https://stackoverflow.com/a/47410972/5769463 – ead Feb 25 '20 at 19:27

1 Answers1

0

I can't reproduce this; for me it just prints start and then hello bob as expected. For whatever reason, it seems that in your case, __name__ is always equal to "__main__" and you get this infinite recursion. I'm using the cpython crate version v0.4.1 and Python 3.8.1 on Arch Linux.

A workaround is to not depend on __name__ at all, but to instead define your Python code as a module with a main() function and then call that function:

use cpython::{Python, PyModule};

fn main() {
    let gil = Python::acquire_gil();
    let py = gil.python();
    let module = PyModule::new(py, "bob").unwrap();
    py.run(r#"
import sys
from multiprocessing import Process

def f(name):
    print('hello', name)

def main():
    print('start')
    sys.argv=['']
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()
    "#, Some(&module.dict(py)), None).unwrap();
    module.call(py, "main", cpython::NoArgs, None).unwrap();
}
Thomas
  • 174,939
  • 50
  • 355
  • 478
  • 1
    I assume you use Linux, which uses fork as start method. The OP might be on Windows or MacOS, which use spawn as start method (https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). The is a problem with frozen modules and spawn, see for example https://stackoverflow.com/q/47325297/5769463 – ead Feb 25 '20 at 19:30
  • Yes that's right, I'm on Windows. I made sure I'm on cpyton 0.4.1 and Python 3.8.1 as @thomas mentioned, but it still gives an infinite loop with message ```thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { ptype: , pvalue: Some(PicklingError("Can't pickle : import of module 'bob' failed")), ptraceback: Some() }', src\libcore\result.rs:1084:5``` – Amaury Feb 26 '20 at 07:27
  • Try calling `multiprocessing.freeze_support()` (but only on Windows) as explained in the answer @ead linked to: https://stackoverflow.com/a/47360452/14637 – Thomas Feb 26 '20 at 07:34
  • Did that. `freeze_support` seems to process what's in `sys.argv`, however it seems that variable is not set in the child processes, so it still recurses. I've then tried adding `mp.set_executable('c:\\venv\\python.exe')` so that it would launch a python interpreter instead of the main executable. It doesn't recurse, but generates another error ```unknown option --multiprocessing-fork usage: c:\test\target\debug\test.exe [option] ... [-c cmd | -m mod | file | -] [arg] ... Try `python -h' for more information.``` – Amaury Feb 26 '20 at 09:58
  • Hmm. Instead of freezing, you could also try putting the module in a separate file on disk and adjusting the interpreter's path so it can find it. You should then be able to load module `bob.py` using `PyModule::import(py, "bob")` and freezing should no longer be necessary. (Just guessing, I'm a bit out of my depth here.) – Thomas Feb 26 '20 at 11:21
  • I'll try that, thanks! Given the difficulty, at this stage it might be easier to create the processes using Rust's process library, start a python interpreter in each of them and send them the code to execute. – Amaury Feb 26 '20 at 13:56