TL;DR: How do I automatically add a watch in gdb when a function is called so I can debug some memory corruption?
I am currently dealing with some memory corruption in C++ I am mostly seeing 4-5 types of reaccuring crashes - all of which make little to no sense, so I'm guessing it has to be related to memory corruption.
These crashes only happen on the production server, round about every 2-5hours. Most of them consist of accessing or passing a null pointer where it cant possibly have existed in the first place. One of these places is a lambda capturing this. (see below)
Obviously looked at core dumps and even had gdb attached while it crashed valgrind: I've spent hours staring at multiple instances of valgrind with no success. Enabled gccs stack protection (-fstack-protector-all) I have tried looking over the code & the changes, but it has been impossible for me to find anything (100k lines of code total, "On master, 10,437 files have changed and there have been 3,352,600 additions and 85,495 deletions." since the last release on the production server). I might have just plain missed something, or not looked in the right spots - I cant tell. Used cppcheck to see if there was something plain obvious wrong with the code
If there is an easier/more straight forward method to finding where the corruption occurs feel free to suggest that too.
Lets look at some simplified code. I have a class, Socket, which manages a client connection. It is constructed something like this
Listener::OnAccept(fd){
Socket* s = new Socket();
if (s->Setup(fd)){
// push into a vector and do some other things
}
}
Socket::Setup calls (virtual) OnConnect of the Socket class, which then creates a ping event, using a lambda:
Socket::OnConnect(){
m_pingEvent = new Event([this](Event* e){
if (!this->GotPong()){
// close connection
}else{
this->Ping();
}
}, 30 /*seconds*/, true /* loop */);
}
Event accepts an std::function as the callback m_pingEvent is deleted in the destructor (if set) which will cancel the event if it is running.
What happens (rarely) is that the lambda calls Ping on a nullptr, which calls m_pingPacket->Send() on this=0x1f8, which leads to a segfault.
My question - or rather my proposed solution - would be watching the captured this pointer for writing, which definitely shouldnt happen. There is only one small issue with that..
How would I even watch such a high ammount of pointers without manually adding each one? (about 400 concurrent connections with a lot (dis)connects)
As for the captured data I found this is in the __closure object:
(gdb) frame 2
#2 0x081b9d63 in operator() (e=0x9b2a748, __closure=0xb5a8318)
at net/socket/Client.cpp:151
151 net/socket/Client.cpp: No such file or directory.
(gdb) ptype __closure
type = const struct {
net::socket::Client * const __this;
} * const
Which I can get when creating the lambda easily by just moving the lambda to "auto callback = " which will be of type:
(gdb) info locals
callback = {__this = 0xb4dd0948}
(gdb) ptype callback
type = struct {
net::socket::Client * const __this;
}
(gdb) print callback
$1 = {__this = 0xb4dd0948}
(This is gcc version 4.7.2 (Debian 4.7.2-5) for reference, might be different with other compilers/versions) Shortly before posting I realized the struct would probably change address once moved into the std::function (is this correct?) I've been digging through the gnu "functional" header, but I havent really been able to find anything yet, I'll keep looking (and updating this)
Another note: I am posting this full describtion with all of the details included in case anyone has an easier solution for me. (XY Problem)
Edit:
(gdb) print *(void**)m_pingEvent->m_callback._M_functor._M_unused._M_object
$8 = (void *) 0xb4dd56d8
(gdb) print this
$4 = (net::socket::Client * const) 0xb4dd56d8
Found it :)
Edit2:
break net/socket/Client.cpp:158
commands
silent
watch -l m_pingEvent->m_callback._M_functor._M_unused._M_object
continue
end
This has two flaws: you can only watch 4 addresses at a time & there is no way to delete the watch once the object will be freed. Soo it's unusable.
Edit 3: I've figured out how to do the watching using this python script I wrote (linking this one externally since it's quite long): https://gist.github.com/imermcmaps/4a6d8a1577118645acf3
Next issue is making sense of the output..
Added watch 7 -> 0x10eb2200
Hardware watchpoint 7: -location m_pingEvent->m_callback._M_functor._M_unused._M_obj
Old value = (void *) 0x10eba4b0
New value = (void *) 0x10eba400
net::Packet::Packet (this=0x10eb1088) at ../shared/net/Packet.cpp:13
Like it's saying it changed from an old value, which shouldn't even be the original value, since I'm checking if the this pointer and the pointer value match, which they do.
Edit 4 (yay): Turns out watch -l doesnt work like i want it to. Manually grabbing the address and then watching that address seems to work