-2

I encountered a strange problem with my Qt-based multi-thread application. After several days running, the application will freeze without any response.

After freeze occurred, I can confirm that several threads, including the main thread, are in futex_wait_queue_me status. When I attach to that application to investigate thread status by GDB, the backtrace of those threads shows that they all stopped at the following function with the same argument futex=0x45a2f8b8 <main_arena>.

__lll_lock_wait_private (futex=0x45a2f8b8 <main_arena>)

I know that on Linux, using non-asynchronous-safe functions within signal handlers is one of possible reasons for this status, i.e. several threads wait for the same mutex, (I can confirm from backtrace that they all stopped at malloc()/free() related function calls), but after I confirmed my Qt application, I can not find implementations related to Linux signal handlers. (but I am not sure whether Qt core library is using Linux signal handlers in its signal/slot mechanism.)

I am sorry that I can not provide source code for this question because it is a huge project. Would you like tell me some possible reasons for this phenomenon, or some advises on how to debug it?

Thanks in advance.

UPDATE 1:

I can provide backtrace, but sorry I have to delete some sensitive information.

Backtrace of sub thread:

#0 in __lll_lock_wait_private (futex=0x4ad078b8 <main_arena>)
#1 in __GI___libc_malloc (bytes=32) at malloc.c:2918
... ...
#11 in SystemEventImp::event(QEvent*) () 
#12 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
#13 in QApplication::notify(QObject*, QEvent*) ()
#14 in QCoreApplication::notifyInternal(QObject*, QEvent*) ()
#15 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
#16 in QCoreApplication::sendPostedEvents (receiver=0x0, event_type=0) at kernel/qcoreapplication.cpp:1329
#17 in QWindowSystemInterface::sendWindowSystemEvents (flags=...) at kernel/qwindowsysteminterface.cpp:560
#18 in QUnixEventDispatcherQPA::processEvents (this=0x8079958, flags=...) at eventdispatchers/qunixeventdispatcher.cpp:70
#19 in QEventLoop::processEvents (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:136
#20 in QEventLoop::exec (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:212
#21 in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1120
#22 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1220
#23 in QApplication::exec () at kernel/qapplication.cpp:2689
#24 in main(argc=2, argv=0xbffff294)

Backtrace of main thread:

#0 in __lll_lock_wait_private (futex=0x4ad078b8 <main_arena>) at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:32
#1 in __GI___libc_malloc (bytes=8) at malloc.c:2918
... ...
#15 in QGraphicsView::paintEvent(QPaintEvent*) ()
#16 in QWidget::event(QEvent*) () 
#17 in QFrame::event(QEvent*) () 
#18 in QGraphicsView::viewportEvent(QEvent*) ()
#19 in Platform::Drawing::GraphicsView::viewportEvent(QEvent*) ()
#20 in QAbstractScrollAreaFilter::eventFilter(QObject*, QEvent*) ()
#21 in QCoreApplicationPrivate::cancel_handler(QObject*, QEvent*) ()
#22 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
#23 in QApplication::notify(QObject*, QEvent*) ()
#24 in QCoreApplication::notifyInternal(QObject*, QEvent*) ()
#25 in QWidgetPrivate::drawWidget(QPaintDevice*, QRegion const&, QPoint const&, int, QPainter*, QWidgetBackingStore*) [clone .part.175] () 
#26 in QWidgetBackingStore::sync() ()
#27 in QWidgetPrivate::syncBackingStore() ()
#28 in QWidget::event(QEvent*) ()
#29 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
#30 in QApplication::notify(QObject*, QEvent*) ()
#31 in QCoreApplication::notifyInternal(QObject*, QEvent*) ()
#32 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
#33 in QCoreApplication::sendPostedEvents (receiver=0x809ea50, event_type=77)
#34 in QGraphicsViewPrivate::dispatchPendingUpdateRequests (this=0x80e4418)
#35 in QGraphicsScenePrivate::_q_processDirtyItems (this=0x80de238) at graphicsview/qgraphicsscene.cpp:508
#36 in QGraphicsScene::qt_static_metacall (_o=0x80d1a80, _c=QMetaObject::InvokeMetaMethod, _id=15, _a=0x865e238)
#37 in QMetaCallEvent::placeMetaCall (this=0x898d020, object=0x80d1a80)
#38 in QObject::event (this=0x80d1a80, e=0x898d020) at kernel/qobject.cpp:1070
#39 in QGraphicsScene::event (this=0x80d1a80, event=0x898d020) at graphicsview/qgraphicsscene.cpp:3478
#40 in QApplicationPrivate::notify_helper (this=0x8077ba0, receiver=0x80d1a80, e=0x898d020) at kernel/qapplication.cpp:3457
#41 in QApplication::notify (this=0x8077970, receiver=0x80d1a80, e=0x898d020) at kernel/qapplication.cpp:2878
#42 in QCoreApplication::notifyInternal (this=0x8077970, receiver=0x80d1a80, event=0x898d020) at kernel/qcoreapplication.cpp:867
#43 in QCoreApplication::sendEvent (receiver=0x80d1a80, event=0x898d020) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:232
#44 in QCoreApplicationPrivate::sendPostedEvents (receiver=0x0, event_type=0, data=0x8073318) at kernel/qcoreapplication.cpp:1471
#45 in QCoreApplication::sendPostedEvents (receiver=0x0, event_type=0) at kernel/qcoreapplication.cpp:1329
#46 in QWindowSystemInterface::sendWindowSystemEvents (flags=...) at kernel/qwindowsysteminterface.cpp:560
#47 in QUnixEventDispatcherQPA::processEvents (this=0x8079958, flags=...) at eventdispatchers/qunixeventdispatcher.cpp:70
#48 in QEventLoop::processEvents (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:136
#49 in QEventLoop::exec (this=0xbfffef50, flags=...) at kernel/qeventloop.cpp:212
#50 in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1120
#51 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1220
#52 in QApplication::exec () at kernel/qapplication.cpp:2689
#53 in main(argc=2, argv=0xbffff294)

UPDATE2:

In response to those valuable comments of this question. I also shared several detailed backtrace files in the following links: 1drv.ms/f/s!AlojS_vldQMhjHRlTfU9vwErNz-H .Please refer to Readme.txt for for some explanation and the libc version I used. By the way, when I tried to replace system() with vfork()/waitpid(), the freeze seems not appear any more. I did not know the reason.

Thanks you all in advance.

Steve Folly
  • 8,327
  • 9
  • 52
  • 63
gzh
  • 3,507
  • 2
  • 19
  • 23
  • Are the threads waiting on a condition variable? (ex. when the queue is empty, wait until there's at least one value in it to do something). – AlexG Mar 01 '19 at 01:37
  • @AlexG, Besides the main thread, in one of other threads, I registered some event handlers and maintain an event queue, and the thread will run an endless loop. If an event can be retrieved from the queue, event handler will be called. I have confirmed that pop/push operation of the event queue have mutex for race condition. – gzh Mar 01 '19 at 01:55
  • Well.. Infinite loop is indeed another way of waiting on a queue. Without knowing the code, that's as far as I can get. – AlexG Mar 01 '19 at 02:10
  • @AlexG, Sorry, I can not get your ideas about waiting on a queue, Is there a scenario that waiting on a queue will incur this phenomenon? Would you like give me some details ? – gzh Mar 01 '19 at 02:17
  • if you were using a [std::condition_variable](https://fr.cppreference.com/w/cpp/thread/condition_variable) and not locking the mutex properly, you would come up into a case where your thread will indefinitely wait to be notified for the condition_variable, but since you are actively waiting on your queue that's not a possibility. In short it's a mechanism to notify 1..n threads whenever a 'condition' is true. In your example that would have been "there's something in the queue". Such CVs are used in edge cases (like when it's empty and needs to pop, or full and needs to push). – AlexG Mar 01 '19 at 02:22
  • @AlexG, After I grep my source code, I confirmed that std::condition_variable was not used. – gzh Mar 01 '19 at 04:21
  • @gzh Do any of the lines you removed from the backtrace say ``? Are there any other threads with such a backtrace? We really need unobfuscated backtraces to get an idea what is going wrong. – Florian Weimer Mar 02 '19 at 12:31
  • @FlorianWeimer, I am sure that there are no any line I removed have something to do with signal handler. what I deleted is·only some business logic. – gzh Mar 02 '19 at 13:10
  • @gzh Is there anything involving `fork` and `_IO_*` functions in the backtraces? – Florian Weimer Mar 02 '19 at 13:57
  • @FlorianWeimer, Beside this backtrace, In other backtraces, I can confirm that sometimes, __libc_system() was called.but the phenomenon did not change, several thread stopped to wait for the same futex. – gzh Mar 02 '19 at 14:28
  • @FlorianWeimer, I have found this glibc bug, https://bugzilla.redhat.com/show_bug.cgi?id=906468, but I am not sure whether this bug will incur my phenomenon, i.e. wait for the same futex. – gzh Mar 02 '19 at 14:35
  • 1
    @gzh Yes, that's why I asked for more backtraces. There are several common application and glibc bugs in this area. If you are not comfortable sharing *all* the backtraces publicly, you'll have to open a support case with your vendor, otherwise we can't tell if there's a workaround you can apply (or if the bug is in the application). – Florian Weimer Mar 02 '19 at 14:55
  • @FlorianWeimer, I have contacted our vendor, They gave me several patches related to the glibc bug I mentioned above. (I think they retrieved those patches from glibc repository). After used those patches, the phenomenon became stable, that means several threads waiting for the same futex can be reproduced. By the way, When I replace fork() with vfork(), my application runs smoothly without any freeze occurred. I am not sure what types of backtrace information will help you judge what is wrong, I am glad to provide more information as your request. – gzh Mar 04 '19 at 02:47
  • @gzh All backtraces that are distinct in the glibc parts would help, particularly if they show the `fork`, `*atfork*` or `_IO_*` functions. Please also add the full glibc version to your question (including the vendor patch level). Some distributions change the locking in `fork`, which could be causing this. – Florian Weimer Mar 04 '19 at 06:14
  • @FlorianWeimer, Thank you very much for your quick reply. I shared several backtrace files for you in the following links: https://1drv.ms/f/s!AlojS_vldQMhjHRlTfU9vwErNz-H Please refer to Readme.txt for for some explanation. – gzh Mar 04 '19 at 07:24

3 Answers3

2

Without source code provided, it is hard to answer the question definitively. In my experience with multithreaded programs it is really easy to overlook some place, where a deadlock can occur. In your case it sounds like something, that is very unlikely to happen. However i would bet, that somewhere in your code you have a potential deadlock.

I would advise you to draw out the whole environment in a diagram and look at which threads use which shared ressources and when and where the mutexes come in.

But as i said in the beginning, without further information it's hard to say.

R.S.
  • 276
  • 1
  • 8
  • Thanks for a quick answer. Most deadlock involved with two or more different mutex, I can find what's wrong easily. But I have no ideas about why they are waiting for the same mutex. – gzh Mar 01 '19 at 02:11
  • @gzh I'm sorry, but without looking at your code, this will be difficult to solve. – R.S. Mar 01 '19 at 02:20
  • I can not show source code, but I can post backtrace here. I have update my question with backtrace information. – gzh Mar 01 '19 at 03:16
2

From the trace-back, it seems the malloc was called when Qt was trying to post an event.

If you are trying to send events across threads, Qt could Queue the events for you. But these events could fill up your memory if it is nor drained out. Then you could get wired behavior from malloc, because there is no memory left.

  • Do you have a mean to monitor the memory usage of your program and see if this happens everytime the memory gets filled up?
  • Do you have a way to reduce the memory that the system has and see if this problem comes up more often?

If above is indeed the issue, then you might take a look at this thread for the solution.

Steve Folly
  • 8,327
  • 9
  • 52
  • 63
Stove
  • 116
  • 4
  • Thanks for your reply. When freeze occurred, I checked memory usage, there is enough free memory, and in my application, I set a upper limit to message queue, If two many events were appended to queue, some events will be discarded. – gzh Mar 05 '19 at 05:07
0

If you are using signals and slots to communicate across threads you should understand the different connection patterns.

  • Auto Connection (default) If the signal is emitted in the thread which the receiving object has affinity then the behavior is the same as the Direct Connection. Otherwise, the behavior is the same as the Queued Connection.
  • Direct Connection The slot is invoked immediately, when the signal is emitted. The slot is executed in the emitter's thread, which is not necessarily the receiver's thread.
  • Queued Connection The slot is invoked when control returns to the event loop of the receiver's thread. The slot is executed in the receiver's thread.
  • Blocking Queued Connection The slot is invoked as for the Queued Connection, except the current thread blocks until the slot returns. Note: Using this type to connect objects in the same thread will cause deadlock.

More here: https://doc.qt.io/archives/qt-5.6/threads-qobject.html

The question does need some code context though. Is this behavior occurring when you are passing data to the UI? If yes are you using QWidgets, QML, ...? A lot of the Qt patterns rely on signals/slots when rendering data to the UI.

user3474985
  • 983
  • 8
  • 20