There are probably ways in Cython to achieve what you want, however I find it easier to use OpenMP from C/C++ and to call the functionality from Cython (thanks to C-verbatim-code since Cython 0.28 it is quite convenient).
One thing is clear, to achieve your goal you need some kind of synchronization between threads, which might impact performance. My strategy is simple: every thread reports "done", all done task are registered and they number is reported from time to time. The count of reported tasks is protected with a mutex/lock:
%%cython -c=/openmp --link-args=/openmp
cdef extern from * nogil:
r"""
#include <omp.h>
#include <stdio.h>
static omp_lock_t cnt_lock;
static int cnt = 0;
void reset(){
omp_init_lock(&cnt_lock);
cnt = 0;
}
void destroy(){
omp_destroy_lock(&cnt_lock);
}
void report(int mod){
omp_set_lock(&cnt_lock);
// start protected code:
cnt++;
if(cnt%mod == 0){
printf("done: %d\n", cnt);
}
// end protected code block
omp_unset_lock(&cnt_lock);
}
"""
void reset()
void destroy()
void report(int mod)
from cython.parallel import prange
def prange_with_report(int n):
reset() # reset counter and init lock
cdef int index
for index in prange(n, nogil = True, num_threads = 5):
report(n//10)
destroy() # release lock
Now calling: prange_with_report(100)
would print done: 10\n
, done: 20\n
, ..., done: 100\n
.
As a slight optimization, report
could be called not for every index
but only for example for index%100==0
- there will be less impact on the performance but also the reporting will be less precise.