13

I have solved this problem myself and the bounty will not be awarded. The problem arose as a consequence of a GUI operation that was being initiated by a non-GUI thread.

Qt 4.7 OSX 10.6.8

There's a lot of code in the app, but not a whole lot involved with what's going on.

The data memory leak occurs in the context of a single connection, which is opened, read, written and closed within a single Qt thread. I'm using a fixed memory object (pMsg) to hold my messages, then sending them to the external device like this:

m_pTcpSocket->write((char*)pMsg->Buf8, (qint64)pMsg->GetLength());

Buf8 is a 2048 byte static array. GetLength is the first 16 bits of the message and'ed against 0xFF, so a number from 0 to 255. Should return 4 for these messages, always has in my diagnostics. Both operations are surrounded by their own mutexes (meaning, different mutexes.)The message lengths are typically 4 bytes. The messages dependably get to the receiving device elsewhere on our wired LAN; they're correct when they arrive and the device responds appropriately with an ACK specific to only those messages. I've tried adding a call to flush() afterwards; doesn't help (nor should there be anything to flush, but...) I don't know that the leak is in the write().

Sending these messages in turn causes me to receive an ACK message from the device. I read it like this:

if (m_pTcpSocket->waitForReadyRead(100))
{
    while ((bytesavailable = m_pTcpSocket->bytesAvailable()))
    {
        m_pTcpSocket->read(RBuf, bytesavailable);
        AssembleMsg(Buf, bytesavailable); // state machine empties Buf
    }
}

After the loop, bytesavailable is zero (of course.) Buf is an unsigned char pointer to 2048 static array of unsigned chars upon which, after each portion of data is received, I run a simple state machine that assembles the messages. Message lengths are 4. Messages are received and assembled as expected, no memory allocations are made, nor objects declared. Both operations are surrounded by their own mutexes (meaning, different mutexes so they can't interact between rx and tx.) Once the message is assembled, all it does is reset a counter that sets the delay to the next keepalive message (which is what these are. without them, the device will drop the connection.) The delay is accumulated by counting after the waitforreadyread(100), which counts intervals of that length as long as the device sends nothing to this port, which is typical behavior. In this way, no timer is required. The timing works fine. Messages are read as soon as they arrive, or at least, within 100 ms. They don't accumulate. So I thought the read buffer would not get larg(er). But... I don't know. Something is getting larger!

So that's the read. But I don't know that the leak is in the read(), either.

BUT it HAS to be one or the other. If I don't send these messages (which means I don't get the ACK messages, either), then there is no leak. Nothing else changes anywhere in the application. This is the mode it powers up in, and no other activity is going on, I'm just keeping the connection open so when it's time to run the radio, the port is ready to go.

Both of these run in the same thread, and they both run off of the same socket. The thread runs continuously, and the same socket remains open (for hours, in fact.) So it's not a socket object delete issue.

The problem is exacerbated with certain brands of SDR radios, as they require the keepalive during receive operation, which means the app sits there and chews up memory like crazy when receiving as WELL as when it is sitting there just waiting to go.

I'm losing about 250 megabytes in approximately 12 hours, in chunks somewhere under 100k. I can watch the app memory increase.1 mb at a time, about once a second.

I have googled extensively, and all I can find talks about is failing to delete the tcp object over multiple connections, which is definitely not the issue here.

I'm really at a loss. Is the problem related to my use of the socket in a thread? The application (a very complex software defined radio app) runs anywhere from 10 to 16 threads, depending on what it's doing; I run the data transfers in their own thread so they aren't compromised by anything that ties up the main event loop.

I've tried valgrind, but it terminates the app a bit after it tries to start it, well before any of this gets going. I don't think it likes threading, or something. Or maybe it's 10.6.8, but anyway, it doesn't work. Qt 4.7 doesn't integrate it anyway. I know of no way to track memory use from within the application so that I could wrap each send and receive and at least figure out which one (or both?) is responsible.

*** edit: By changing the rate of the keepalive message, I directly change the rate of the memory leak, and as I think I said above, if the keepalive isn't being sent, there's no memory loss at all.

That's all I can think of to tell you folks; any suggestions are welcome, any illumination about TCP quirks in Qt would be welcome, basically anything. I've spent many days on this and I'm just stonewalled at this juncture.

fyngyrz
  • 2,458
  • 2
  • 36
  • 43
  • 1
    You wrote a lot of keystrokes, seemingly indicating that you can produce a nice short [MCVE](http://stackoverflow.com/help/mcve) to eliminate all doubt, but have you actually tried reproducing the issue in a single threaded [MCVE](http://stackoverflow.com/help/mcve)? – autistic Dec 30 '15 at 04:15
  • 1
    I see five function calls that we don't have the code for (4 with `m_pTcpSocket` and `AssembleMsg`). Any one of those could be the culprit. Can you add the code for those? – 1201ProgramAlarm Dec 30 '15 at 04:40
  • People, the issue does not reproduce in a short example. It's probably something I'm doing outside this; but we're talking a very large number of lines of code - the app is about 55k lines of code, not counting anything generated by Qt. And there is a heck of a lot of back and forth TCP and incoming UDP. If it's not obvious here, it's going to be even less obvious elsewhere. It could certainly be elsewhere - I do think I'm doing this right, I'm just out of ideas. I feel kind of bad for offering the bounty, I think I may be wasting people's time. But I don't see a way to withdraw it. – fyngyrz Dec 30 '15 at 06:39
  • 4
    Have you tried to use valgrind? – Danh Dec 30 '15 at 08:49
  • @fyngyrz regarding feeling bad: for the remainder of the bounty time, you could add a note to the top of your question stating you have solved your problem yourself and that the bounty will not be awarded. Alternatively, you could distribute the bounty across responders who stated the problem could not be solved with the information that you provided (if that is the correct answer). – Ludwig Schulze Jan 02 '16 at 22:11

5 Answers5

7

I found it. Drawing from a non-gui thread was breaking Qt in a very indirect way. Stopped doing that, and it stopped leaking. Thanks everyone.

It is @Shf who deserves the credit, but sadly, I didn't really understand bounties that well and I probably told him to get in here and answer too late. I will make it up to him -- when he gets my message -- by offering a bounty on the question where he actually provided the critical hint. The bounty will consist of the rest of my stack overflow rep, including what's been earned by this question. Best I can do for now; I'll know better next time. It's definitely been educational.

fyngyrz
  • 2,458
  • 2
  • 36
  • 43
  • Communication with UI in Qt should be performed only by signal and slots which will protected you from multi thread mistakes. Also Qt newtworp part is designed in such way that threading is not needed. – Marek R Jan 04 '16 at 09:03
  • I'm happy to pass the bounty to @shf. I think if he posts an answer I can then start a bounty and award it. But I don't see any answer ...? SO bounty system is indeed a minefield! – Roddy Jan 07 '16 at 10:12
  • That's very kind of you. He hasn't gotten back to me - I left a couple of pointers for him. New Year's vacation, perhaps. :) – fyngyrz Jan 07 '16 at 16:17
5

Not really enough to work on in terms of code, but I'd look at these things:-

  • How do you know you have a memory leak?
  • How do you know it's not actually heap corruptions
  • There's not a 'new' or 'delete' anywhere in sight. if you're not using them, then the 'leak' is likely in the TCP handling.
  • Sockets : Try closing this and re-opening every so often. Does the leak get cleaned when you do that?
  • You read into RBuf but then assemble from Buf ...?
  • What type is RBuf? Why no bounds checking on the amount you read into it?
  • Wireshark - Look at what's being sent/received on your socket - anything unusual going on there. Or, anything going to OTHER sockets.
  • Are you actually reading the bytes from the socket? Check the return value from read, and see this question.
Community
  • 1
  • 1
Roddy
  • 66,617
  • 42
  • 165
  • 277
3

Important clues can be found in things that linearly affect the leak rate. You mentioned keepalive messages as one such thing, and I gather that those are sent, not received, by your application.

From the sending side, you show how you send a single message, but not how you manage the queue of outgoing messages. My suggestion therefore is to check if messages are properly removed after sending or if there is some other problem in managing that data structure.

2

There appears to be no leak in the code you show and describe.

Since Valgrind doesn't work, the next best thing is to try LeakSanitizer (http://clang.llvm.org/docs/LeakSanitizer.html) and/or AddressSanitizer (http://clang.llvm.org/docs/AddressSanitizer.html). Hell, run all of the sanitizers that you can, maybe something will come up.

Other than that, the only clue I get from the code is the handling of pMsg: how is it allocated and deallocated? We don't see code for that. Examine it or share it if you wish.

srdjan.veljkovic
  • 2,468
  • 16
  • 24
1

There is no need to use multi threading. Checkout my other answer. It matches your problem perfectly and will terminate you multi threading issues.

Also in Qt always use signal and slots. By default they protect code from cross thread problems and have many more advantages.

Community
  • 1
  • 1
Marek R
  • 32,568
  • 6
  • 55
  • 140
  • Threading is required here because otherwise far too much to do is packed into the GUI thread and the app cannot run successfully. In any case, it wasn't the TCP thread that was doing the GUI op anyway. So thanks, but no thanks. Problem is solved, as I said. – fyngyrz Jan 05 '16 at 16:46
  • you are totally wrong. You application design is simply corrupted. Blocking a thread is always a bad solution. Also in Qt communication between threads should be performed by using signals and slots, this simplifies a lot and it is safe. [Here you have](http://blog.qt.io/blog/2010/06/17/youre-doing-it-wrong/) some hints about mutithreading. – Marek R Jan 06 '16 at 10:15
  • You did, your code: `if (m_pTcpSocket->waitForReadyRead(100))` it is 100 milliseconds but still it is blocking. This would not pass code review in my company. – Marek R Jan 07 '16 at 10:51
  • That's not a block. That's a suspension. During that wait, the thread is *suspended* if there is nothing for it to do. The CPU returns to the core pool for other activity until either there is new data, or the timer times out. If there is data, it comes back immediately. It makes no sense for the code to loop without suspending itself if there is nothing for it to do. – fyngyrz Jan 07 '16 at 16:15