3

Summary:

I have a class implementing shared memory using Boost Interprocess. A segmentation fault occurs from a method read() which accesses the named_condition. When I look at the values of nearby std::string class members in GDB they are corrupted too. One contains the value of a string literal I only log via a third party macro.

I am going to describe the problem, show you the code which sets the intended string values and then ask whether I am causing the problem, or is it the third party logger?

I am using Clang (not GCC).

This problem is incredibly difficult to reproduce. It occurs once every 300+ runs.

Details:

This is my object containing the Boost Interprocess named_condition and nearby std::strings which I noticed had corrupted values:

char                                        a[1000];     
std::shared_ptr<bip::named_mutex>           mutex{nullptr};
char                                        b[1000];     
MyVector*                                   vec{nullptr};
std::shared_ptr<bip::managed_shared_memory> segment{nullptr};
std::shared_ptr<bip::named_condition>       cond_empty;                 // Seg fault 
bool                                        destroy_memory{false};
std::string                                 shared_vector_name;         // Weird value
std::string                                 shared_mutex_name;          // Weird value 
std::string                                 shared_cv_name;             // Weird value
std::string                                 shared_memory_name;         // Weird value
std::string                                 tag_name;

The two char arrays were added weeks ago to detect (probably this same) segmentation fault.

The segmentation fault occurs when I try to access the named_condition when attempting to read the shared memory:

std::vector<T> read(const bool clearAfterReading, const bool readImmediately = false)
{
    checkMemory();   // Loops over 'a' and checks for corruption
    std::vector<T> readItems;
    std::cout << "Reader trying to obtain mutex" << std::endl;

    bip::scoped_lock<bip::named_mutex> lock(*sdc.mutex);

    if(sdc.vec->empty() && readImmediately == false)
    {
        std::cout << "Reader waiting. vec size: " << sdc.vec->size() << std::endl;
        sdc.cond_empty->wait(lock);       //SEG FAULT OCCURS HERE

When I look at the state of sdc.cond_empty in GDB I notice m_base = 0x0:

(gdb) p *(sdc.cond_empty._M_ptr)
$12 = {m_cond = {m_shmem = {<boost::interprocess::ipcdetail::managed_open_or_create_impl_device_holder<false, boost::interprocess::shared_memory_object>> = {<No data fields>}, static ManagedOpenOrCreateUserOffset = 16, 
      m_mapped_region = {m_base = 0x0, m_size = 104, m_page_offset = 0, m_mode = boost::interprocess::read_write, m_is_xsi = false}}}}

so I assume I am correct that this named_condition has been corrupted?

I then decided to check the values of surrounding std::string class members (scroll right to end):

(gdb) p sdc.shared_vector_name
$22 = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fff80a0d438 "HandleRunRequest()"}}
(gdb) p sdc.shared_mutex_name
$23 = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fff80a0d468 "File exists"}}
(gdb) p sdc.shared_cv_name
$24 = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fff80a0d498 "/abc_shared_memory"}}
(gdb) p sdc.shared_memory_name
$25 = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fff80a0d408 ""}}
(gdb) p sdc.tag_name
$26 = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fffc021d598 "abc"}}

and I notice their values are completely wrong. The values appear to be coming from string literals I log via a third party macro.

The intended std::string values are set via the following code:

This const:

extern "C" 
{
    const std::string ABC_SH_MEM_NAME = "abc";
}

is passed through the shared memory reader constructor:

sharedMemReader(CONSTS::ABC_SH_MEM_NAME, CONSTS::ABC_SH_MEM_SIZE, true),

via the tag argument:

SharedDataReader(const std::string& tag, const int numBytes, const bool destroyMemory)
{
    sdc.initialise(tag, numBytes, destroyMemory);

so hard-coded strings can be appended to tag and assigned to the aforementioned string class members:

void initialise(const std::string& tag, const int numBytes, const bool ownMemory)
{
    std::cout << std::endl << "Started initialisation..." << std::endl;

    const std::string sharedMemoryName = tag + "_shared_memory";
    const std::string sharedVectorName = tag + "_shared_vector";
    const std::string sharedMutexName = tag + "_shared_mutex";
    const std::string sharedCVName = tag + "_shared_cv";

    tag_name = tag;
    shared_memory_name = sharedMemoryName;
    shared_mutex_name = sharedMutexName;
    shared_vector_name = sharedVectorName;
    shared_cv_name = sharedCVName;
    destroy_memory = ownMemory;

As mentioned previously, shared_vector_name is corrupted with the value "HandleRunRequest()".

My only ever usage of this string/value is calling a third party logging macro:

void MyClass::HandleRunRequest() 
{
    THIRD_PARTY_LOG_MACRO("HandleRunRequest()");

but this method is not called immediately-near read(), where the seg fault occurs. This method calls a few other methods, before spawning a new thread, which then calls read().

Is there absolutely anything I could be doing to cause this corruption? Or, is the third party logging storing values in random pointer addresses?

I still have the core dump and library, but please be aware this problem is incredibly difficult to reproduce.

user997112
  • 29,025
  • 43
  • 182
  • 361
  • Is it possible to provide something people can try and fiddle around with in their own environment? – Passer By Jul 20 '18 at 16:43
  • Maybe the `sdc` object was deleted? This could explain why you see random values in its member. – Olivier Sohn Jul 20 '18 at 17:01
  • @OlivierSohn I don't think so, only because it's a class member of the class which calls read(), so should be alive. – user997112 Jul 20 '18 at 17:21
  • @user997112 well without seing the rest of the code it's difficult to say. But maybe the parent object is deleted too. – Olivier Sohn Jul 20 '18 at 18:35
  • @OlivierSohn The architecture is a parent class written by myself inherits from the third party interface. My parent class contains the shared memory reader. I'm fairly confident this hasn't been destructed, but I will try to add some checks. – user997112 Jul 20 '18 at 18:39
  • Or maybe it's a buffer overflow, I think you could detect this by setting a gdb breakpoint on memory write to see which call is overwriting the content of the members: https://stackoverflow.com/questions/58851/can-i-set-a-breakpoint-on-memory-access-in-gdb – Olivier Sohn Jul 20 '18 at 18:46
  • @OlivierSohn The problem I have is that I don't start my application, I start the third party's platform and it runs my application. So it's hard running it under GDB. Is there anything I can learn from the fact it's value was a string literal? Where are string literals stored etc? Is this a clue? I forgot to add in question but this is Clang compiler. – user997112 Jul 20 '18 at 18:57

0 Answers0