-1

I have a section of code which runs in ROS runloop with 3 lambdas - two for modifying flags and one for invoking the ROS runloop until topics hit a particular condition. Essentially close to

bool wait_until_engine_started()
{
    bool engine_started_state = false;
    bool engine_started_event = false;

    // Psuedocode for a subscription wrapper which invokes the below lambda
    auto engine_state_checker =
            [&](const EngineRosMessage::ConstPtr& msg) {
                // Psuedocode for a validation function
                if (engine_appears_to_be_on(msg))
                {
                    // Modify the referenced boolean here
                    engine_started_state = true;
                }
                // Return to runloop
            }
        );

    // Psuedocode for a subscription wrapper which invokes the below lambda
    auto engine_code_checker = 
            [&](const DifferentEngineRosMessage::ConstPtr& msg) {
                // Psuedocode for a validation function
                if (engine_appears_to_be_on_via_different_method(msg))
                {
                    // Modify the referenced boolean here
                    engine_started_event = true;
                }
                // Return to runloop
            }
        );

    return utils::spin_until_condition([&](){
        return engine_started_state && engine_started_event;
    });
}

With

bool spin_until_condition(std::function<bool()> condition)
{
    while(ros::ok() && !condition())
    {
        ros::spinOnce();
    }
    return ros::ok();
}

I am hitting a segfault in some cases with the lambda used in the spin_until_condition lambda in some cases when specific sections of code unrelated to this section are included.

Probing in GDB shows that on my machine

  • at the level of the engine_started_event declaration the address of engine_started_event is 0x7fffffffc3ff
  • inside the lambda engine_code_checker the address of engine_started_event is 0x7fffffffc3ff
  • inside the rvalue lambda in spin_until_condition the address of engine_started_event is originally 0x7fffffffc3ff, but after engine_started_event = true, moves to 0x1007fffffffc3ff at which point the segfault occurs

This behaviour is very reliably disabled by removal of a particular block of code that is unrelated to this block. Further more the above section of code is traversed twice - once before the problem-causing block and once after, with the issue occurring only on the second round.

AFAIK - there is no reason a reference should ever change its address, and the reliability of removing the problem block makes me think its responsible but I can't see how they would be affected given that the booleans and the 3rd lambda are stack allocated variables.

I'm running this on gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11) - running this in gcc-7 did not cause a segfault which is making me suspect a compiler whoopsie. But I've learnt time and again that the compiler is usually pretty good at its job, and the fact that removing our code removes the issue seems to point strongly at our code. My guess now is that a bad memory write in the unrelated code causes the reference to change somehow.

Valgrind did also not show anything about it apart from the actual segfaulting access at 0x1007fffffffc3ff

So - the TL;DR

  • How can a lambda's reference-capture address change its address in the way that it exhibits above (including strange cases of bad memory access)
  • Are there any sensible ways of debugging this sort of situation so that I can catch the offending code doing a write where this reference lives
  • Or is this a compiler whoopsie
Andrew Lipscomb
  • 936
  • 8
  • 19
  • The address of the object a reference refers to can never change, just like the address of an object cannot change. Either the referred object's address isn't actually changing (misuse of tools or tool error) or you have Undefined Behavior. – François Andrieux Jun 09 '19 at 23:42
  • I'm assuming you miscopied the code, because this won't compile (two unmatched `)`). But from what you've shown, you never call the first two lambda expressions, you just create them so your local `bool`s can never change, so it seems like your spin loop will spin forever. If it has no [forward progress](https://en.cppreference.com/w/cpp/language/memory_model#Forward_progress) in that loop, that infinite loop may also be Undefined Behavior. – François Andrieux Jun 09 '19 at 23:44
  • 1
    It would be a lot easier to answer this question if you provided a compilable example. Please read about [MCVE]s. – François Andrieux Jun 09 '19 at 23:46
  • @FrançoisAndrieux yep you are right - I can't say I was able to get this down to a reproducible example. It'd have to be heap corruption, I just needed to see if there were anymore sane possibilities that I hadn't considered. The block of code that I remove is not my code - so I am stuck for options. Thanks in any case – Andrew Lipscomb Jun 13 '19 at 10:37

1 Answers1

0

This kind of behaviour is probably heap corruption. The block you are removing that seems unrelated is most probably accessing memory it shouldn't. Check your loops for writing of the bounds of an array, accessing deleted memory or double deletions.

What probably is happening is that piece of code overwrites the memory the lambda object lives and ending up changing its captured value.

Exaila
  • 119
  • 2
  • This is most likely it - but is there any way of capturing the bad access in valgrind (or gdb, or something similar)? I'm at a loss unless I can find a point in code actually accessing the address. I'm guessing valgrind only picks it up if its accessing unallocated memory – Andrew Lipscomb Jun 13 '19 at 10:40
  • There is a way to detect it using custom allocators or a custom new operator. The idea is that everytime you allocate you add a guard area in your memory before and after. After that, you can detect if anything wrote on that guarded area. Here is an example of that: https://stackoverflow.com/questions/33891513/find-out-where-heap-memory-gets-corrupted Hope that helps. – Exaila Jun 16 '19 at 21:05