2

I have an application written in C++ & Qt that does a lot of network requests. The basic outline of my code is below:

{
    QNetworkReply* reply = networkAccessManager().get( QNetworkRequest( url ) );
    assert( reply );

    connect( reply, &QNetworkReply::finished, [=]
    {
        // do action based on the contents of the reply

        assert( reply->isFinished() );
        reply->deleteLater();
    });
}

The code keeps multiple requests in flight at the same time. Both of the asserts have never fired.

Randomly (about every 200000 requests), the deferred delete of this reply fails with something that appears to be an double free. This happens both in Qt 5.0.2 and Qt 5.2.x. I've ran valgrind with the following result:

==18792== Invalid read of size 8
==18792==    at 0x53AAC7A: QObject::~QObject() (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x4EB60A8: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Network.so.5.0.2)
==18792==    by 0x53A4357: QObject::event(QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537EBBC: QCoreApplication::notify(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537E8BD: QCoreApplication::notifyInternal(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x5380AC5: QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x53C38D4: QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537D88A: QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x51F422A: QThread::exec() (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x51F8A4A: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x5812B4F: start_thread (pthread_create.c:304)
==18792==    by 0x62A1A7C: clone (clone.S:112)
==18792==  Address 0xb9fd670 is 0 bytes inside a block of size 16 free'd
==18792==    at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457)
==18792==    by 0x53A4357: QObject::event(QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537EBBC: QCoreApplication::notify(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537E8BD: QCoreApplication::notifyInternal(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x5380AC5: QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x53C38D4: QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537D88A: QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x538115F: QCoreApplication::exec() (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x4093C4: main (main.cpp:38)

I think that the following things are certainly true:

  • The main thread deletes an resource.
  • The network thread then tries to delete the same resource. Which then results in an double delete. Which fails in some way

I think, but cannot verify, that the following things are true:

  • this resource is an QNetworkReply. It could be some Qt internal resource that i do not know about. Apart from this QNetworkReply, my codebase does not contain any QObjects that are frequently created or destructed.

I'm having difficulties in my attempts to solve this error. From the inspection of the stack trace, it appears the reply->deleteLater() signal is somehow delivered to both the network thread and the main thread. But i don't see how that could be the case. The signal and slot programming style makes it very hard to see where exactly something goes wrong.

How would i approach debugging this error?


An answer mentions a possible origin in synchronization. In my codebase only 1 class is permitted to be called from a different thread. The functions from this class fall in 2 categories:

  1. query some internal state.
  2. emit an signal.

The second category is implemented as:

class Foo {
  Q_OBJECT
public:
  void Foo() {
    connect( foo, &Foo::doSomethingSignal, this, &Foo:doSomethingInternal, Qt::QueuedConnection );
  }

  // this functions gets called from various threads
  void doSomething() {
    emit( doSomethingSignal() );
  }

private slots:
  // this function happens synchronized in the main thread
  void doSomethingInternal() {
    ...
  }

signals:
  void doSomethingSignal();
}

According to this stackoverflow question: emit Qt signal from non Qt Thread or ouside Qt main event loop with at 4.5 this is safe. The caller is not an QObject.

Community
  • 1
  • 1
Stefan
  • 1,539
  • 13
  • 11
  • 1
    Say if you do find the error, the issue now is if the fix easy or hard. For issues like this, I would think that changing the code to use a std::shared_ptr may be what's needed. http://stackoverflow.com/questions/9127816/stdshared-ptr-thread-safety-explained You just need to know what the shared resource is, create a shared_ptr and use that. – PaulMcKenzie Apr 04 '14 at 01:39
  • I don't think that you have knowledge about how Qt's resource system works? The QNetworkReply is owned by the network thread, and should be deleted using the `deleteLater()` slot. The network reply is then deleted the next time the network thread returns to the event loop. There is no resource leak, since `QNetworkReply::finished` is always called even if the reply timed out. – Stefan Apr 04 '14 at 02:18
  • 1
    I assume you've tried commenting out the `reply->deleteLater()` line and verified that the memory corruption no longer occurs? At least until you run out of memory anyway ... – njahnke Apr 04 '14 at 02:29
  • Why don't you post the whole code (main and network threads)? It is hard to tell more without seeing more code. – László Papp Apr 04 '14 at 02:45
  • @Laszlo The network thread is a concept that exists completely inside the Qt library. – Stefan Apr 04 '14 at 13:25
  • @njahnke I tried it without `reply->deleteLater()` as you suggested, and the crash did not occur. I do however run out of memory in about half an hour. Since i suspect this is a bug in Qt itself, im going to try to reproduce the bug in a minimum working sample. – Stefan Apr 06 '14 at 19:06
  • Do you check whether the QNetworkReply object has any errors before you read from it? – njahnke Apr 06 '14 at 22:07
  • @jahnke Yes, I've checked all cases and error checking is used everywhere before reading the reply. – Stefan Apr 07 '14 at 22:57

2 Answers2

1

Answering my own question:

I've made a test case with fairly minimum functionality. After some testing, it appears that the bug exists in my test case. Which is why i have made a bug report: https://bugreports.qt-project.org/browse/QTBUG-38309

The testcase can be viewed here: https://bitbucket.org/sdessens/qnetworkreply-access-violation-testcase/overview

There exists a workaround that involves waiting a few hundred milliseconds before deleting the reply, this works fine in the test case, but not in my application for some reason (after a few minutes, the event loop appears to stop working). The impact of random crashes is not earth scattering, so for now i am sticking a a while loop in bash to keep my application running until the Qt devs resolve this issue.

Stefan
  • 1,539
  • 13
  • 11
0

From your Valgrind error report, it looks like program us trying to read the memory after some other thread has freed it. It does not looks like double free scenario rather than use after free.

==18792== Invalid read of size 8

==18792== Address 0xb9fd670 is 0 bytes inside a block of size 16 free'd

You may check out my earlier post on Valgrind and how GDB/Valgrind can be used together to perform the live debugging at the time of first error reported by your program.

This problem seems to due to synchronization problem between threads and some time one thread is freeing it which other thread is not aware of. Memory related problem in multi-threaded environments are very difficult to understand and resolve. You may want to think of using the suggestion provided in the above comments(smart pointer in C++/RAII based classes).

Community
  • 1
  • 1
Mantosh Kumar
  • 5,659
  • 3
  • 24
  • 48
  • @Stefan:Well you have mentioned that object vtable is corrupted.Internally once free happens, most of the implementation(heap allocator) would write some forward and back pointer information of chunk in the first 4-8 byte of a chunk. As VPTR would be on the zero offset of this object and hence it might have overwritten by some pointer internally by heap allocator. So memory has been freed for sure and now its being accessed/read by the program. – Mantosh Kumar Apr 04 '14 at 02:29
  • (re-posting since i missed the 5min edit window) Double free or use after free... that is the same in this situation. Inspection of the assembly code leads me to believe that the object's vtable is corrupt (of better said, already free'd by something else). Additionally, the exact purpose of deleteLater() is to delete the object in a synchronized way. The documentation even mentions that it is safe to call the deleteLater() method multiple times. I find i hard to believe that such basic functionally that has been in Qt for ages contains bugs. – Stefan Apr 04 '14 at 02:36
  • @Stefan:Well my explanation does not point that there seems to be problem in Qt library. I feel that there is some problem in your code due to which you are trying to read the memory once freed has happened. This seems to be the program error rather than library because program(client) is using/passing the illegal memory. Well I can not comment more on this as I am not very much aware of Qt library. – Mantosh Kumar Apr 04 '14 at 02:39
  • Your answer points to an problem in synchronization. The thing is that 'the network thread' concept is something entirely manged by Qt. Users don't even have to know it exists (except for one detail, which is that `deleteLater()` should be used to destroy its resources). I do use threads, but i believe my usage is correct. I'll update the question in a sec. – Stefan Apr 04 '14 at 12:28