The problem was that I was not using QThread
properly.
The result of printing
print("(Current Thread)", QThread.currentThread(),"\n")
print("(Current Thread)", int(QThread.currentThreadId()),"\n")
noticed me that the PickleDumpingThread
I created was running in the main thread, not in some seperated thread.
The reason of this is run()
is the only function in QThread
that runs in seperate thread, so method like savePickle
in QThread
run in main thread.
First Solution
The proper usage of using signal was using Worker as following.
from PyQt5.QtCore import QThread
class GenericThread(QThread):
def run(self, *args):
# print("Current Thread: (GenericThread)", QThread.currentThread(),"\n")
self.exec_()
class PickleDumpingWorker(QObject):
pickleDumpingSignal = pyqtSignal(dict)
def __init__(self):
super().__init__()
self.pickleDumpingSignal[dict].connect(self.savePickle)
def savePickle(self, signal_dict)
pickle.dump(signal_dict["deque"], open(file, "wb"))
pickleDumpingThread = GenericThread()
pickleDumpingThread.start()
pickleDumpingWorker = PickleDumpingWorker()
pickleDumpingWorker.moveToThread(pickleDumpingThread)
class Analyzer():
def __init__(self):
self.cnt = 0
self.dataDeque = deque(MAXLENGTH=10000)
def onData(self, data):
self.dataDeque.append({
"data": data,
"createdTime": time.time()
})
self.cnt += 1
if self.cnt % 10000 == 0:
pickleDumpingWorker.pickleDumpingSignal.emit({
"action": savePickle,
"deque": self.dataDeque
})
# pickle.dump(dataDeque, open(file, 'wb'))
This solution worked (pickle was dumped in seperate thread), but drawback of it is the data stream still delays about 0.5~1 seconds because of signal emit() function.
I found the best solution for my case is @PYPL 's code, but the code needs a few modifications to work.
Final Solution
Final solution is modifying @PYPL 's following code
thread = PickleDumpingThread(self.dataDeque)
thread.start()
to
self.thread = PickleDumpingThread(self.dataDeque)
self.thread.start()
The original code have some runtime error. It seems like thread is being garbage collected before it dumps the pickle because there's no reference to that thread after onData()
function is finished.
Referencing the thread by adding self.thread
solved this issue.
Also, it seems that the old PickleDumpingThread
is being garbage collected after new PickleDumpingThread
is being referenced by self.thread
(because the old PickleDumpingThread
loses its reference).
However, this claim is not verified (as I don't know how to view current active thread)..
Whatever, the problem is solved by this solution.
EDIT
My final solution have delay too. It takes some amount of time to call Thread.start()..
The real final solution I choosed is running infinite loop in thread and monitor some variables of that thread to determine when to save pickle. Just using infinite loop in thread takes a lots of cpu, so I added time.sleep(0.1) to decrease the cpu usage.
FINAL EDIT
OK..My 'real final solution' also had delay..
Even though I moved dumping job to another QThread, the main thread still have delay about pickle dumping time! That was weird.
But I found the reason. The reason was neither emit() performance nor whatever I thought.
The reason was, embarrassingly, python's Global Interpreter Lock prevents two threads in the same process from running Python code at the same time.
So probably I should use multiprocessing module in this case.
I'll post the result after modifying my code to use multiprocessing module.
Edit after using multiprocessing
module and future attempts
Using multiprocessing
module
Using multiprocessing
module solved the issue of running python code concurrently, but the new essential problem arised. The new problem was 'passing shared memory variables between processes takes considerable amount of time' (in my case, passing deque
object to child process took 1~2 seconds). I found that this problem cannot be removed as long as I use multiprocessing
module. So I gave up to use `multiprocessing module
Possible future attempts
1. Doing only File I/O in QThread
The essential problem of pickle dumping is not writing to file, but serializing before writing to file. Python releases GIL when it writes to file, so disk I/O can be done concurrently in QThread
. The problem is, serializing deque
object to string before writing to file in pickle.dump
method takes some amount of time, and during this moment, main thread is going to be blocked because of GIL.
Hence, following approach will effectively decrease the length of delay.
We somehow stringify the data object every time when onData()
is called and push it to deque object
In PickleDumpingThread
, just join
the list(deque)
object to stringify the deque
object.
file.write(stringified_deque_object)
. This can be done concurrently.
The step 1 takes really small time so it almost non-block the main thread.
The step 2 might take some time, but it obviously takes smaller time than serializing python object in pickle.dump
method.
The step 3 doesn't block main thread.
2. Using C extension
We can manually release the GIL and reacquire the GIL in our custom C-extension module. But this might be dirty.
3. Porting CPython to Jython or IronPython
Jython and IronPython are other python implementations using Java and C#, respectively. Hence, they don't use GIL in their implementation, which means that thread
really works like thread.
One problem is PyQt
is not supported in these implementations..
4. Porting to another language
..
Note:
json.dump
also took 1~2 seconds for my data.
Cython is not an option for this case. Although Cython has with nogil:
, only non-python object can be accessed in that block (deque
object cannot be accessed in that block) and we can't use pickle.dump
method in that block.