2

I am trying to upgrade a python script that runs an executable on windows and manages the text output files to a version that uses multiple threaded processes so I can utilize more than one core. I have four separate versions of the executable which each thread knows to access. This part works fine. Where I run into problems is when they are running simultaneously and try to open the (different) output files to ensure they ran correctly and react depending on the contents of the output file.

Specifically, when running three threads, two will crash with the following error, while one continues to work:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 552, in __bootstrap_inner
    self.run()
  File "E:\HDA\HDA-1.0.1\Hm-1.0.1.py", line 782, in run
    conf = self.conf_file(Run)
  File "E:\HDA\HDA-1.0.1\Hm-1.0.1.py", line 729, in conf_file
    l = open(self.run_dir(Run)+Run, 'r').readlines()     #list of file lines
IOError: [Errno 2] No such file or directory: 'Path/to/Outputfile'

This results from the thread not correctly running the executable (i.e. why 'Path/to/Outputfile' was not created and hence can't be found). But one of the threads does this correctly while the other two cannot. Is there a reason why I can't get multiple threads running different versions of an executable?

Thursdays Coming
  • 1,002
  • 13
  • 28

2 Answers2

2

Python cannot currently exploit multiple cores, because of the Global Interpreter Lock. Multithreading tends to be fraught with trouble, anyway—better to use multiple processes if you can.

Lawrence D'Oliveiro
  • 2,768
  • 1
  • 15
  • 13
2

I don't think GIL by itself wouldn't kill this by itself unless opening a file gets you into some weird deadlock or spinlock condition. In general, you want threads in cases like this where you're I/O-bound. In fact, the fact that the threads are able to run concurrently probably contributes to the other threads failing rather than successfully opening a file several times.

On slide fifteen of this presentation, the author points out that the GIL releases on blocking I/O calls to give other threads a chance.

The real problem here seems to be a lock on a file resource. I'm not really sure about how Windows works, so I can't speak to why this error is creeping up, but it seems like only one thread actually has a lock on a file resource.

The other poster's point about multiple cores and the GIL might be coming into play, in that you could have some sort of priority inversion going on where the other two threads are getting starved, but I find it unlikely given that the above presentation says that threads in the middle of a blocking operation free the lock for other threads.

One thought is to try multiprocessing. I suspect you'll have better luck with reading the file across multiple processes rather than with threads.

Here is an example I wrote and tried on my OS 10.7.3 machine, it opens up a file test whose contents are lol\n:

import multiprocessing
import os

def open_file(x):
   with open(x, 'r') as file_obj:
     return file_obj.readlines()

a = multiprocessing.Pool(4)
print a.map(open_file, ['test']*4)

Here's the result when I execute it:

➜  ~ git:(master) ✗ python open_test.py
[['lol\n'], ['lol\n'], ['lol\n'], ['lol\n']]
mvanveen
  • 9,754
  • 8
  • 33
  • 42