-11
as a return code might mean that C program is not fine e.g., you are starting too many subprocesses and it causes SIGSERV
in the C executable. You can limit number of concurrent subprocesses using multiprocessing.ThreadPool, concurrent.futures.ThreadPoolExecutor, threading + Queue -based solutions:
#!/usr/bin/env python
from multiprocessing.dummy import Pool # uses threads
from subprocess import Popen, PIPE
def get_url(url):
p = Popen(["executable", url], stdout=PIPE, stderr=PIPE, close_fds=True)
output, error = p.communicate()
return url, output, error, p.returncode
pool = Pool(20) # limit number of concurrent subprocesses
for url, output, error, returncode in pool.imap_unordered(get_url, urls):
print("%s %r %r %d" % (url, output, error, returncode))
Make sure the executable can be run in parallel e.g., it doesn't use some shared resource. To test, you could run in a shell:
$ executable url1 & executable url2
Could you please explain more about "you are starting too many subprocesses and it causes SIGSERV in the C executable" and possibly solution to avoid that..
Possible problem:
- "too many processes"
- -> "not enough memory in the system or some other resource"
- -> "trigger the bug in the C code that otherwise is hidden or rare"
- -> "illegal memory access"
- -> SIGSERV
The suggested above solution is:
- "limit number of concurrent processes"
- -> "enough memory or other resources in the system"
- -> "bug is hidden or rare"
- -> no SIGSERV
Understand what is SIGSEGV run time error in c++? In short, your program is killed with that signal if it tries to access a memory that it is not supposed to. Here's an example of such program:
/* try to fail with SIGSERV sometimes */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void) {
char *null_pointer = NULL;
srand((unsigned)time(NULL));
if (rand() < RAND_MAX/2) /* simulate some concurrent condition
e.g., memory pressure */
fprintf(stderr, "%c\n", *null_pointer); /* dereference null pointer */
return 0;
}
If you run it with the above Python script then it would return -11
occasionally.
Also p.returncode is not sufficient for debugging purpose..Is there any other option to get more DEBUG info to get to the root cause?
I won't exclude the Python side completely but It is most likely that the problem is the C program. You could use gdb
to get a backtrace to see where in a callstack the error comes from.