1

TL;DR: How do I automatically add a watch in gdb when a function is called so I can debug some memory corruption?

I am currently dealing with some memory corruption in C++ I am mostly seeing 4-5 types of reaccuring crashes - all of which make little to no sense, so I'm guessing it has to be related to memory corruption.

These crashes only happen on the production server, round about every 2-5hours. Most of them consist of accessing or passing a null pointer where it cant possibly have existed in the first place. One of these places is a lambda capturing this. (see below)

Obviously looked at core dumps and even had gdb attached while it crashed valgrind: I've spent hours staring at multiple instances of valgrind with no success. Enabled gccs stack protection (-fstack-protector-all) I have tried looking over the code & the changes, but it has been impossible for me to find anything (100k lines of code total, "On master, 10,437 files have changed and there have been 3,352,600 additions and 85,495 deletions." since the last release on the production server). I might have just plain missed something, or not looked in the right spots - I cant tell. Used cppcheck to see if there was something plain obvious wrong with the code

If there is an easier/more straight forward method to finding where the corruption occurs feel free to suggest that too.

Lets look at some simplified code. I have a class, Socket, which manages a client connection. It is constructed something like this

Listener::OnAccept(fd){
    Socket* s = new Socket();
    if (s->Setup(fd)){
        // push into a vector and do some other things
    }
}

Socket::Setup calls (virtual) OnConnect of the Socket class, which then creates a ping event, using a lambda:

Socket::OnConnect(){
    m_pingEvent = new Event([this](Event* e){
        if (!this->GotPong()){
            // close connection
        }else{
            this->Ping();
        }
    }, 30 /*seconds*/, true /* loop */);
}

Event accepts an std::function as the callback m_pingEvent is deleted in the destructor (if set) which will cancel the event if it is running.

What happens (rarely) is that the lambda calls Ping on a nullptr, which calls m_pingPacket->Send() on this=0x1f8, which leads to a segfault.

My question - or rather my proposed solution - would be watching the captured this pointer for writing, which definitely shouldnt happen. There is only one small issue with that..

How would I even watch such a high ammount of pointers without manually adding each one? (about 400 concurrent connections with a lot (dis)connects)

As for the captured data I found this is in the __closure object:

(gdb) frame 2
#2  0x081b9d63 in operator() (e=0x9b2a748, __closure=0xb5a8318)
at net/socket/Client.cpp:151
151     net/socket/Client.cpp: No such file or directory.
(gdb) ptype __closure
type = const struct {
    net::socket::Client * const __this;
} * const

Which I can get when creating the lambda easily by just moving the lambda to "auto callback = " which will be of type:

(gdb) info locals
callback = {__this = 0xb4dd0948}
(gdb) ptype callback
type = struct {
    net::socket::Client * const __this;
}
(gdb) print callback
$1 = {__this = 0xb4dd0948}

(This is gcc version 4.7.2 (Debian 4.7.2-5) for reference, might be different with other compilers/versions) Shortly before posting I realized the struct would probably change address once moved into the std::function (is this correct?) I've been digging through the gnu "functional" header, but I havent really been able to find anything yet, I'll keep looking (and updating this)

Another note: I am posting this full describtion with all of the details included in case anyone has an easier solution for me. (XY Problem)

Edit:

(gdb) print *(void**)m_pingEvent->m_callback._M_functor._M_unused._M_object
$8 = (void *) 0xb4dd56d8
(gdb) print this
$4 = (net::socket::Client * const) 0xb4dd56d8

Found it :)

Edit2:

break net/socket/Client.cpp:158
commands
silent
watch -l m_pingEvent->m_callback._M_functor._M_unused._M_object
continue
end

This has two flaws: you can only watch 4 addresses at a time & there is no way to delete the watch once the object will be freed. Soo it's unusable.

Edit 3: I've figured out how to do the watching using this python script I wrote (linking this one externally since it's quite long): https://gist.github.com/imermcmaps/4a6d8a1577118645acf3

Next issue is making sense of the output..

Added watch 7 -> 0x10eb2200
Hardware watchpoint 7: -location m_pingEvent->m_callback._M_functor._M_unused._M_obj

Old value = (void *) 0x10eba4b0
New value = (void *) 0x10eba400
net::Packet::Packet (this=0x10eb1088) at ../shared/net/Packet.cpp:13

Like it's saying it changed from an old value, which shouldn't even be the original value, since I'm checking if the this pointer and the pointer value match, which they do.

Edit 4 (yay): Turns out watch -l doesnt work like i want it to. Manually grabbing the address and then watching that address seems to work

imer
  • 136
  • 1
  • 8

1 Answers1

0

How do I automatically add a watch in gdb when a function is called so I can debug some memory corruption?

Memory corruption is often detected after the real corruption has already occurred by some modules loaded within your process. So manual debugging may not be very useful for real complex projects.Because any third party modules/library which is loaded within your process may also lead to this problem. From your post it looks like this problem is not reproducible always which indicates that this might be related to threading/synchronization problem which leads to some sort of memory corruption. So based on my experience i strongly suggest you to concentrate on reproducing the problem under dynamic tools(Valgrind/Helgrind).

However as you have mentioned in your question that you are able to attach your program using Valgrind. So you may want to attach your program(a.out) in case you have not done in this way.

$ valgrind --tool=memcheck --db-attach=yes ./a.out

This way Valgrind would automatically attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This seems to be the best possible way to find out the root cause of your problem.

However I think that there may be some data racing scenario which is leading to memory corruption.So you may want to use Helgrind to check/find data racing/threading problem which might be leading to this problem.

For more information on these, you may refer the following post:

https://stackoverflow.com/a/22658693/2724703

https://stackoverflow.com/a/22617989/2724703

Community
  • 1
  • 1
Mantosh Kumar
  • 5,659
  • 3
  • 24
  • 48
  • Like mentioned I have already looked at this in valgrind with gdb attached. No invalid writes, double frees or the like until the final crash happens. The only thing I'm getting is one uninitialized memory error and that's due to me only filling out the values that get used in a big packet which gets send once at the start of the programm. What I forget to mention that the programm is single threaded – imer Jun 21 '14 at 07:00