1

I parse a big source code directory (100k files). I traverse every line in every file and do some simple regex matching. I tried threading this task to multiple threads but didn't get any speedup. Only multiprocessing managed to cut the time by 70%. I'm aware of the GIL death grip, but aren't threads supposed to help with IO bound access?

If the disk access is serial, how come several processes finish the job quicker?

susdu
  • 852
  • 9
  • 22

1 Answers1

1

Python "threads" permit independent threads of execution, but typically do not permit concurrency because of the global interpreter lock: only one thread can really be running at a time. This may be the reason why you only get a speedup with multiple processes, which do not share a global interpreter lock and thus can run concurrently.

Warren Dew
  • 8,790
  • 3
  • 30
  • 44