1

I am having some trouble with a GUI that is freezing during a file save operation that is taking some time, and I'd love to understand why that is.

I've followed the instructions of Schollii's wonderful answer on a similar question, but there must be something I'm missing because I cannot get the GUI behaving as I expect.

The below example is not runnable, since it shows only the relevant parts, but hopefully it's enough to get a discussion going. Basically I have a main application class that generates some large data, and I need to save it to HDF5 format, but this takes some time. To leave the GUI responsive, the main class creates an object of the Saver class and a QThread to do the actual data saving (using moveToThread).

The output of this code is pretty much what I would expect (i.e. I see a message that the "saving thread" has a different thread id than the "main" thread) so I know that another thread is being created. The data is successfully saved, too, so that part is working correctly.

During the actual data saving however (which can take some minutes), the GUI freezes up and goes "Not responding" on Windows. Any clues as to what is going wrong?

Stdout during running:

outer thread "main" (#15108)
<__main__.Saver object at 0x0000027BEEFF3678> running SaveThread
Saving data from thread "saving_thread" (#13624)

Code sample:

from PyQt5 import QtCore, QtGui, QtWidgets
from PyQt5.QtCore import QThread, pyqtSignal, pyqtSlot, QObject

class MyApp(QtWidgets.QMainWindow, MyAppDesign.Ui_MainWindow):

    def save_file(self):
        self.save_name, _ = QtWidgets.\
            QFileDialog.getSaveFileName(self)


        QThread.currentThread().setObjectName('main')
        outer_thread_name = QThread.currentThread().objectName()
        outer_thread_id = int(QThread.currentThreadId())
        # print debug info about main app thread:
        print('outer thread "{}" (#{})'.format(outer_thread_name,
                                               outer_thread_id))

        # Create worker and thread to save the data
        self.saver = Saver(self.data,
                           self.save_name,
                           self.compressionSlider.value())
        self.save_thread = QThread()
        self.save_thread.setObjectName('saving_thread')
        self.saver.moveToThread(self.save_thread)

        # Connect signals
        self.saver.sig_done.connect(self.on_saver_done)
        self.saver.sig_msg.connect(print)
        self.save_thread.started.connect(self.saver.save_data)
        self.save_thread.start())

    @pyqtSlot(str)
    def on_saver_done(self, filename):
        print('Finished saving {}'.format(filename))


''' End Class '''


class Saver(QObject):
    sig_done = pyqtSignal(str)  # worker id: emitted at end of work()
    sig_msg = pyqtSignal(str)  # message to be shown to user

    def __init__(self, data_to_save, filename, compression_level):
        super().__init__()
        self.data = data_to_save
        self.filename = filename
        self.compression_level = compression_level

    @pyqtSlot()
    def save_data(self):
        thread_name = QThread.currentThread().objectName()
        thread_id = int(QThread.currentThreadId())  
        self.sig_msg.emit('Saving data '
                          'from thread "{}" (#{})'.format(thread_name,
                                                          thread_id))

        print(self, "running SaveThread")
        h5f = h5py.File(self.filename, 'w')
        h5f.create_dataset('data',
                           data=self.data,
                           compression='gzip',
                           compression_opts=self.compression_level)
        h5f.close()
        self.sig_done.emit(self.filename)


''' End Class '''
jat255
  • 667
  • 7
  • 18

1 Answers1

1

There are actually two issues here: (1) Qt's signals and slots mechanisms, and (2) h5py.

First, the signals/slots. These actually work by copying arguments passed to the signal, to avoid any race conditions. (This is just one of the reasons you see so many signals with pointer arguments in the Qt C++ code: copying a pointer is cheap.) Because you're generating the data in the main thread, it must be copied in the main thread's event loop. The data is obviously big enough for this to take some time, blocking the event loop from handling GUI events. If you instead (for testing purposes) generate the data inside the Saver.save_data() slot, the GUI remains responsive.

However, you'll now notice a small lag after the first "Saving data from thread..." message is printed, indicating that the main event loop is blocked during the actual save. This is where h5py comes in.

You're presumably importing h5py at the top of your file, which is the "correct" thing to do. I noticed that if you instead import h5py directly before you create the file, this goes away. My best guess is that the global interpreter lock is involved, as the h5py code is visible from both the main and saving threads. I would have expected that the main thread would be entirely inside Qt module code at this point, however, over which the GIL has no control. So, like I said, I'm not sure what causes the blocking here.

As far as solutions, to the extent you can do what I described here, that will alleviate the problem. Generating the data outside the main thread, if possible, is advisable. It may also be possible to pass some memoryview object, or a numpy.view object, to the saving thread, though you'll then have to deal with thread-synchronization yourself. Also, importing h5py inside the Saver.save_data() slot will help, but isn't feasible if you need the module elsewhere in the code.

Hope this helps!

bnaecker
  • 6,152
  • 1
  • 20
  • 33
  • Thanks for the reply. I hadn't considered the time needed to copy the data. And you're right, if I load up some test data in `Saver.save_data()` and then save it, everything stays responsive. The issue is that I have another class that generates the data, and my main application holds an instance of that class, and so passes the data to the `Saver` that way. I'm not sure how to go about fixing that, but I'll play with the `memoryview` and Numpy views like you suggested – jat255 Nov 02 '17 at 19:10
  • In more testing, I don't think it's the copying data in memory that's locking things up. If that were the case, wouldn't the lock up happen when initializing the `Saver` object (i.e. the `self.data = data_to_save` line in the constructor)? It appears the lockup happens during the actual saving to disk (with the `h5py` stuff) – jat255 Nov 02 '17 at 19:24
  • @jat255 Yep, you're right, I don't know why I thought that the data was emitted as a signal argument. I think I changed some stuff to make a working example. Anyway the second point about importing `h5py` locally still seems to be relevant. – bnaecker Nov 02 '17 at 21:21
  • Interestingly, if I save as a `numpy` object directly (.npy or .npz), I don't get any sort of lockup, so this is specific to something hapenning in `h5py`, it would seem – jat255 Nov 03 '17 at 14:59
  • @jat255 Yeah, I was digging around in `h5py` code, and it seems that it does some locking if it notices multiple threads are accessing the same file. It's not documented, as far as I can see, but the `h5py._objects` shared library has a `FastRLock` object, which seems to implemented the locking in this case. – bnaecker Nov 03 '17 at 15:03
  • Thanks again for looking into all this. I've investigated `tables` as an alternative to access the HDF5 library, and it seems to work if I save using `carray`, but causes the same lockup if using `array`. I'm not sure why, but it seems to be working well enough for me. I'll leave the question open for a few days, but if no other answers come in I'll accept yours. Thanks again! – jat255 Nov 03 '17 at 15:30