2

I have some Python code that leverages ctypes.CDLL, according to the docs this does not involve the gil. With that being said, I am experiencing some bottlenecks that I am unclear of when profiling. If I run some trivial code using time.sleep or even ctypes.windll.kernel32.Sleep I can see the time scale equally as the number of threads matches the number of tasks, in other words if the task is to sleep 1 second and I submit 1 task in 1 thread or 20 tasks in 20 threads they both take ~1 second to complete.

Switching back to my code, it is not scaling out as expected but rather linearly. Profiling indicates waits from acquire() in _thread.lock.

What are some techniques to further dig into this to see where the issue is manifesting? Is ThreadPoolExecutor not the optimal choice here? I understood it implemented a basic thread pool and was no different than ThreadPool from multiprocessing.pool?

dano
  • 91,354
  • 19
  • 222
  • 219
Ritmo2k
  • 993
  • 6
  • 17
  • How much time does your threaded code actually spend inside the `ctypes.CDLL` calls relative to the total the code runs? It would also be helpful if you could include some example code that demonstrates the behavior. – dano Oct 10 '14 at 15:58
  • Also note that the output of some profilers when profiling multi-threaded programs [can be misleading](http://stackoverflow.com/questions/24199026/how-to-speed-up-communication-with-subprocesses/24236117#24236117). – dano Oct 10 '14 at 16:01
  • I will try to quantify the time in cpython versus c calls. That being said, the code is not mine and its proprietary so my hands are rather tied. Does ThreadPoolExecutor implement concurrency similar to asyncio in that it requires code to be cooperative and therefor is maybe not suitable to my need? – Ritmo2k Oct 10 '14 at 16:20
  • `ThreadPoolExecutor` is just a list of `threading.Thread` objects with logic built around it, similar to `multiprocessing.pool.ThreadPool`. (See the [source code](https://hg.python.org/cpython/file/d9f71bc6d897/Lib/concurrent/futures/thread.py).) If the code you're running inside the threads is releasing the GIL, then multiple threads can run concurrently. But all the code running in the threads that *doesn't* release the GIL needs to run sequentially. – dano Oct 10 '14 at 16:23
  • Thank you for that clarification. ProcessPoolExecutor is also scaling equally bad. I am beginning to suspect this has more to do with the underlying c calls blocking against the application api they are mining data from. – Ritmo2k Oct 10 '14 at 16:28
  • That seems likely, if `ProcessPoolExecutor` is demonstrating the same behavior. – dano Oct 10 '14 at 16:29

0 Answers0