Releasing the GIL in PyBind11 for multithreading in Python with OpenMP

Question

I am using Pybind11 and trying to use OpenMP in it. I call a c++ function from Python using the PyBind interpreter and GIL, then I compute a multithreaded for loop with OpenMP in c++, in which I call within each thread a python function. To do so, I first limit the scope of the GIL with py::gil_scoped_release release; going until the parallel region and then recover it outside the parallel region with `py::gil_scoped_acquire acquire;

The problem is inside the multithreaded for loop. I need to create an interpreter for each thread. My question is: How to do it ?

I have first thought about using the py::scoped_interpreter guard{}, but in the documentation, it said that:

Creating two concurrent scoped_interpreter guards is a fatal error. So is >calling initialize_interpreter() for a second time after the interpreter has already been initialized.

Do not use the raw CPython API functions Py_Initialize and Py_Finalize as these do not properly handle the lifetime of pybind11’s internal data.

The second point is quite a problem as there is more documentation but only with using the Python C-API, not dealing with Pybind objects 1 2 3

I have looked at the documentation but it doesn't address exactly my problem because it doesn't call Python within multithreading. I have also looked at this 4 but the interpreter is called from c++ and not Python

Any hint would be highly appreciated as I have been blocked for quite some time on this problem. Here is a sample code to illustrate the problem.

py::object create_seq(
  py::object self
  ){

  std::vector<py::object> dict = self.cast<std::vector<py::object>>();

  py::gil_scoped_release release;

  #pragma omp parallel for ordered schedule(dynamic)
  for(unsigned int i=0; i<dict.size(); i++) {
     ?? WHAT HAPPENS HERE ?? 
    std::cout << i << std::endl;
    #pragma omp ordered
    dict[i].attr("attribute") = open.attr("parallel")().cast<int>();
    }

  py::gil_scoped_acquire acquire;

  return self;
}

PYBIND11_MODULE(error, m){

    m.doc() = "pybind11 module for iterating over generations";

    m.def("create_seq", &create_seq,
      "the function which creates a sequence");

}

Python code

import error

class test():
    def __init__(self):
        self.attribute = None

def func():
    return 2

if __name__ == '__main__':
    dict = {}
    for i in range(50):
        dict[i] = test()
    pop = error.create_seq(list(dict.values()))

Compiled with

g++ -O3 -Wall -shared -std=c++14 -fopenmp -fPIC `python3 -m pybind11 --includes` openmp.cpp -o error.so

Doesn't look right to me: the way you wrote that code, you have only `gstate`, with all threads writing into it. If `PyGILState_Ensure` properly blocks, it may well work, but I wouldn't rely on it. — Wim Lavrijsen, Jan 15 '20 at 18:05
I've written the code in a the way you recommend, but I really don't understand why does it take more time than with no parallelization. Because Py_BEGIN_ALLOW_THREADS releases the GIL and then PyGILState_Ensure just ensures that the GIL is held PER thread. (I will perhaps delete this question and rewritte a new one as the question isn't the same anymore) — Joachim, Jan 15 '20 at 21:35
and is this a parallelization issue or a hyperthreading issue? Do you get the same with and without hyperthreading? — camelccc, Jan 16 '20 at 00:56
@camelccc it's a hyperthreading issue, as the code works, but I do get the same output with and without hyperthreading with even larger execution time with hyperthreading — Joachim, Jan 16 '20 at 06:02
You are doing all job under lock, i.e. only one thread is *active* at any time. How is it supposed to be faster than serial? — Sergei, Jan 16 '20 at 11:10
@Sergei But doesn't Py_BEGIN_ALLOW_THREADS precisely release the lock ? — Joachim, Jan 16 '20 at 12:25
@Sergei Yes ok but if it's within the parallel Region, than it just acquires the lock of a thread running in parallel with others. My way of understanding this is just like having the GIL within threads running in parallel. So there shouldn't be any Problem with that as the Python objects acquired are proper to each thread and not shared between threads. I don't know if my Explanation is clear, please tell me if it's not. — Joachim, Jan 16 '20 at 12:57
@Sergei I'm sorry I made a mistake. I meant: Is there a way I could create "several" GILs for several threads running in parallel and then invoke the lock per/for each thread ? — Joachim, Jan 16 '20 at 13:11
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/206089/discussion-between-joachim-and-sergei). — Joachim, Jan 16 '20 at 18:35
First, thanks to all for your replies, the source of the problem is getting always clearer. I have updated my question with what I have learned and read this afternoon — Joachim, Jan 16 '20 at 18:53

Releasing the GIL in PyBind11 for multithreading in Python with OpenMP

0 Answers0

Linked