How can I debug memory crash if sanitizer doesn't show anything?

Question

I have a complex application that crashes on exit. I can't reproduce an error with a minimal example. The crash happens when the destructor for a class is called on the application exit and a shared pointer member gets destroyed. What I'm basically doing is this:

// plugin (.so loaded at runtime)
// called during application run
void SomePluginClass::foo()
{
    auto ptr = std::make_shared<int>();
    libraryObj.bar(ptr);
}

// library (.so linked to the executable and the plugin)
// SomeLibraryClass.hpp
class SomeLibraryClass
{
public
    // ... some other code

    ~SomeLibraryClass();
    void bar(std::shared_ptr<int> ptr);

private:
    std::shared_ptr<int> m_ptr{};
}

// SomeLibraryClass.cpp
// called during application run
void SomeLibraryClass::bar(std::shared_ptr<int> ptr) { m_ptr = ptr; }

// called on application exit and cleanup
SomeLibraryClass::~SomeLibraryClass()
{
    // crash happens here
    // use_count shows 1
    // reset() used here for debugging purposes as it causes the same crash as implicit destructor call
    m_ptr.reset();
}

I tried to run the application with Valgrind and gcc address sanitizer - they both don't show any problems during the runtime, but show the problem after the crash. For example, here are some lines of sanitizer's output:

==11744==ERROR: AddressSanitizer: SEGV on unknown address 0x7f56b3ba0c20 (pc 0x555ac6680ead bp 0x7ffc9d3ce920 sp 0x7ffc9d3ce910 T0)
==11744==The signal is caused by a READ memory access.
    #0 0x555ac6680eac in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/7/bits/shared_ptr_base.h:154
    #1 0x555ac6680b33 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/7/bits/shared_ptr_base.h:684
    #2 0x7f56e5e562cd in std::__shared_ptr<int, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/7/bits/shared_ptr_base.h:1123
    #3 0x7f56e5e56574 in std::__shared_ptr<int, (__gnu_cxx::_Lock_policy)2>::reset() /usr/include/c++/7/bits/shared_ptr_base.h:1235

What does numbers (pc 0x555ac6680ead bp 0x7ffc9d3ce920 sp 0x7ffc9d3ce910 T0) mean?

What else can I do to find the crash source?

Why are you doing `m_ptr.reset();` in your destructor? The shared pointer will release it's resources when it is destroyed as normal part of member destruction for your object. Your class doesn't need a user defined destructor *at all*. — Jesper Juhl, Jan 22 '20 at 16:29
`pc` = program counter (which instruction in the binary you are at), `bp` = base pointer (where your function local memory starts), `sp` = stack pointer (where you are on the stack). See also https://stackoverflow.com/questions/1395591/what-is-exactly-the-base-pointer-and-stack-pointer-to-what-do-they-point. — Max Langhof, Jan 22 '20 at 16:40
@JesperJuhl The line was added for debugging purposes. The resetting fires the same crash as an implicit destructor call. Updated the example. — nikitablack, Jan 22 '20 at 18:18
@MaxLanghof Thank you. Do you have an idea what is `T0` in the line? — nikitablack, Jan 22 '20 at 18:20

score 1 · Accepted Answer · answered Jan 26 '20 at 03:09

What does numbers (pc 0x555ac6680ead bp 0x7ffc9d3ce920 sp 0x7ffc9d3ce910 T0) mean?

At the time of crash (which was caused by trying to access address 0x7f56b3ba0c20), the values of program counter (PC), frame pointer (BP) and stack pointer (SP) registers were 0x555ac6680ead, 0x7ffc9d3ce920 and 0x7ffc9d3ce910 respectively.

The program counter value corresponds to the std::_Sp_counted_base<...>::_M_release() function.

We have no idea where the crashing address 0x7f56b3ba0c20 came from. It's not near the current stack pointer, and doesn't look like heap address (though it could be), nor does it look like random garbage. ASan has no idea where this address came from either.

One possible explanation: the address was on heap, then it got deleted and moved to quarantine (which ASan uses to tell you about dangling access), but then the quarantine capacity was exceeded with other deletes, causing ASan to "forget" what it knew about that address (ASan can't keep info about every deleted memory block forever -- that would cause you to run out of memory).

You can try increase the size of ASan quarantine buffer with:

ASAN_OPTIONS=quarantine_size_mb=4096

(default is 256, increase until you run out of memory or until ASan tells you that you are in fact accessing dangling memory).

score 0 · Answer 2 · answered Jan 22 '20 at 16:55

0

If you want to destroy a resource that allocated outside the library when destructing SomeLibraryClass object, you are messing with the resource ownership. You should not do that. If you just want to release your shared ownership of the obj managed by ptr, you do not need to call 'm_ptr.reset()' at all.

answered Jan 22 '20 at 16:55

Meng Tian

26
3

1

You should understand that this is an example showing the issue. The code by itself is valid and should not cause the crash. Resetting the pointer was added by me for debugging purposes. Updated the example. – nikitablack Jan 22 '20 at 18:22
This does not answer the OP's question. – yugr Jan 23 '20 at 08:08

How can I debug memory crash if sanitizer doesn't show anything?

2 Answers2