3

Help me please..
I'm trying to call python scripts from different C++ threads and faced some problem.

main:

Py_Initialize();    
PyEval_InitThreads();
PyThreadState *mainThreadState = PyThreadState_Get();
PyEval_ReleaseLock();
PyInterpreterState *mainInterpreterState = mainThreadState->interp;
...
//creating threads with myThreadState per thread
    PyEval_AcquireLock();
    PyThreadState *myThreadState = PyThreadState_New(mainInterpreterState);
    PyEval_ReleaseLock();
//running threads
...
PyEval_RestoreThread(mainThreadState);
Py_Finalize();

run() function in thread object:

PyEval_AcquireLock();
PyThreadState_Swap(m_threadState);
...
script = "f = open('file_for_this_thread','w')\n"   
         "print f\n"
         "f.write('111')\n"                     
         "print f.fileno()\n"
PyRun_SimpleString( script );
...
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();

'print f' displays correct file info for each file But something is wrong, because second 'print f' prints the same for different threads and the output (if there will be the one) will go to one file instead of different file for each thread
File handlers become equal if i insert time.sleep(1) instead of f.write, too Nothing crashes..

also tried using PyGILState_Ensure/PyGILState_Release, same effect
main:

Py_Initialize();
PyEval_InitThreads();
PyThreadState*  mainThreadState = PyEval_SaveThread();
...
//creating and running threads
...
PyEval_RestoreThread(mainThreadState);
Py_Finalize();

locker:

TPyScriptThreadLocker:
    PyGILState_STATE m_state;
public:
    TPyScriptThreadLocker(): m_state(PyGILState_Ensure() {}
    ~TPyScriptThreadLocker() { PyGILState_Release(m_state); }

run() function in thread object:

TPyScriptThreadLocker lock;
...
script = "f = open('file_for_this_thread','w')\n"   
         "print f.fileno()\n"
         "f.write('111')\n"                     
         "print f.fileno()\n"
PyRun_SimpleString( script );

I know that multithreading in python is not good idea in most cases, but now I want to know what is wrong with this code..

Python 2.7
info from http://www.linuxjournal.com/article/3641?page=0,2

source: http://files.mail.ru/9D4TEF pastebin: http://pastebin.com/DfFT9KN3

Anton N.
  • 137
  • 2
  • 12
  • 1
    Sorry if this sounds like a silly question, but how do you define the file names? In your code it just says `'file_for_this_thread'`. How exactly is that file name determined? – jogojapan Mar 02 '12 at 12:34
  • its just a shortcut, actually there is unique filepath for each thread like QString("d:\\%1"). Right files appears, first f.write works well, second not.. Same happens if i insert time.sleep(1) before the f.fileno output, i.e. when 2 different threads begin to run (concurrently) in same time as i understood – Anton N. Mar 02 '12 at 13:36
  • Do you have a minimal example that demonstrates the problem? – James Mar 02 '12 at 13:51
  • Yes, with Qt, I'll upload it.. – Anton N. Mar 02 '12 at 13:53
  • I have a suspicion that what happens is that the Python variable for the file, `f`, is treated as a global by the Python interpreter and shared across all threads. All threads are using the same interpreter instance, after all. – jogojapan Mar 02 '12 at 14:06
  • To test this hypothesis, could you try using a different variable name for `f` in each thread? I.e. composing the contents of the `script` variable by putting `f1` in place of `f` when running the first thread, `f2` when running the second etc.? – jogojapan Mar 02 '12 at 14:15
  • There's something like this, script opens file 'd:\\py_test\\%1', where %1 is string argument that is string 'MM'+thread_number (MM%1). So files have been creating have names MM0, MM1 – Anton N. Mar 02 '12 at 14:22
  • have renamed 'f' to 'somefilename', nothing changed.. – Anton N. Mar 02 '12 at 14:23
  • I mean a _different_ name for `f` in every thread – jogojapan Mar 02 '12 at 15:06
  • =) Yes, with different name it works fine. But is it planned behaviour of interpeter? – Anton N. Mar 02 '12 at 15:32
  • Oh, understand now, global variable.. Thank you!) – Anton N. Mar 02 '12 at 15:36
  • There will be no such problem if use functions like PyImport_ExecCodeModule/PyObject_CallObject with different module names, as i see.. Will try later – Anton N. Mar 02 '12 at 15:40
  • Ok; I have bundled our findings in a proper answer, which you might want to _accept_. See below. – jogojapan Mar 02 '12 at 15:55

1 Answers1

1

As analysed in my comments, the problem is caused by the fact that all threads in your code use the same instance of the Python interpreter, created and initialised here:

Py_Initialize();    

When the first thread runs the script defined here:

script = "f = open('file_for_this_thread','w')\n"   
         "print f.fileno()\n"
         "f.write('111')\n"                     
         "print f.fileno()\n"

this causes the Python interpreter to assign a global Python variable f. Shortly after that, another thread causes it to redefine the same global variable. This might not have happened at the time of the first print f.fileno(), but it apparently happens before the second one.

The solution is to ensure no global variables are shared across threads (or use a different instance of the Python interpreter in each thread, at great additional memory cost).

Since currently the only global Python variable in your code is f, it suffices to use a different name for f in every thread. As your code gets more complicated, it will be better to define a Python function and use f (and any other variables you need) as a local variables:

PyRun_SimpleString(
   "def myfunc(thread_no):\n"
   "    f = open('file_for_thread_%d' % thread_no,'w')\n"
   "    print f.fileno()\n"
   "    f.write('111')\n"               
   "    print f.fileno()\n"
 );

The above would have to be applied only once and before any of the threads run.

In each thread, you would then simply do

PyRun_SimpleString(QString("myfunc(%d)\n",current_thread_no));

I.e. the threads would only call the Python function, and f would become a local variable.

jogojapan
  • 68,383
  • 11
  • 101
  • 131