Running Python code in parallel from Rust with rust-cpython

Question

I'm trying to speed up a data pipeline using Rust. The pipeline contains bits of Python code that I don't want to modify, so I'm trying to run them as-is from Rust using rust-cpython and multiple threads. However, the performance is not what I expected, it's actually the same as running the python code bits sequentially in a single thread.

Reading the documentation, I understand when invoking the following, you actually get a pointer to a single Python interpreter that can only be created once, even if you run it from multiple threads separately.

    let gil = Python::acquire_gil();
    let py = gil.python();

If that's the case, it means the Python GIL is actually preventing all parallel execution in Rust as well. Is there a way to solve this problem?

Here's the code of my test:

use cpython::Python;
use std::thread;
use std::sync::mpsc;
use std::time::Instant;

#[test]
fn python_test_parallel() {
    let start = Instant::now();

    let (tx_output, rx_output) = mpsc::channel();
    let tx_output_1 = mpsc::Sender::clone(&tx_output);
    thread::spawn(move || {
        let gil = Python::acquire_gil();
        let py = gil.python();
        let start_thread = Instant::now();
        py.run("j=0\nfor i in range(10000000): j=j+i;", None, None).unwrap();
        println!("{:27} : {:6.1} ms", "Run time thread 1, parallel", (Instant::now() - start_thread).as_secs_f64() * 1000f64);
        tx_output_1.send(()).unwrap();
    });

    let tx_output_2 = mpsc::Sender::clone(&tx_output);
    thread::spawn(move || {
        let gil = Python::acquire_gil();
        let py = gil.python();
        let start_thread = Instant::now();
        py.run("j=0\nfor i in range(10000000): j=j+i;", None, None).unwrap();
        println!("{:27} : {:6.1} ms", "Run time thread 2, parallel", (Instant::now() - start_thread).as_secs_f64() * 1000f64);
        tx_output_2.send(()).unwrap();
    });

    // Receivers to ensure all threads run
    let _output_1 = rx_output.recv().unwrap();
    let _output_2 = rx_output.recv().unwrap();
    println!("{:37} : {:6.1} ms", "Total time, parallel", (Instant::now() - start).as_secs_f64() * 1000f64);
}

Python code can only be executed from a single thread at a time. There is nothing preventing Rust code from being run in other threads while the Python code is running in one thread. — Jmb, Feb 10 '20 at 13:09

score 2 · Accepted Answer · answered Feb 11 '20 at 10:32

The CPython implementation of Python does not allow executing Python bytecode in multiple threads at the same time. As you note yourself, the global interpreter lock (GIL) prevents this.

We don't have any information on what exactly your Python code is doing, so I'll give a few general hints how you could improve the performance of your code.

If your code is I/O-bound, e.g. reading from the network, you will generally get nice performance improvements from using multiple threads. Blocking I/O calls will release the GIL before blocking, so other threads can execute during that time.
Some libraries, e.g. NumPy, internally release the GIL during long-running library calls that don't need access to Python data structures. With these libraries, you can get performance improvements for multi-threaded, CPU-bound code even if you only write pure Python code using the library.
If your code is CPU-bound and spends most of its time executing Python bytecode, you can often use multipe processes rather than threads to achieve parallel execution. The multiprocessing in the Python standard library helps with this.
If your code is CPU-bound, spends most of its time executing Python bytecode and can't be run in parallel processes because it accesses shared data, you can't run it in multiple threads in parallel – the GIL prevents this. However, even without the GIL, you can't just run sequential code in parallel without changes in any language. Since you have concurrent access to some data, you need to add locking and possibly make algorithmic changes to prevent data races; the details of how to do this depend on your use case. (And if you don't have concurrent data access, you should use processes instead of threads – see above.)

Beyond parallelism, a good way to speed up Python code with Rust is to profile your Python code, find the hot spots where most of the time is spent, and rewrite these bits as Rust functions that you call from your Python code. If this doesn't give you enough of a speedup, you can combine this approach with parallelism – preventing data races is generally easier to achieve in Rust than in most other languages.

Thanks a lot. The situation matches the third option in your list, the code is a mix of Pandas and Numpy operations that are CPU-bound. — Amaury, Feb 11 '20 at 13:52
I will try adding generated wrap-around Python Python to use [multiprocessing](https://docs.python.org/3/library/multiprocessing.html). The advice on rewriting Python bits in Rust is also good, but for various reasons this would have deep-reaching impacts on the projects, so I'll keep this as a second-line option. — Amaury, Feb 11 '20 at 13:58
Actually, a first step before rewriting the hot functions in Rust is to look at their implementation and see whether it can be improved in Python. — Sven Marnach, Feb 11 '20 at 14:45

score 0 · Answer 2 · answered Nov 27 '22 at 09:41

0

If you use py03 bindings you can use the allow_threads method and callbacks to free the GIL for faster parralelism: https://pyo3.rs/v0.13.2/parallelism.html

answered Nov 27 '22 at 09:41

Happy Machine

987
8
30

Running Python code in parallel from Rust with rust-cpython

2 Answers2

Linked