-3

At a certain point in my program, I am getting a segmentation fault at a line like the one in g() below:

// a data member type in MyType_t below
enum class DataType {
    BYTE, SHORT, INT, VCHAR, FLOAT, DOUBLE, BOOL, UNKNOWN
};

// a data member type in MyType_t below
enum class TriStateBool {
    crNULL, crTRUE, crFALSE
};

// this is the underlying type for std::vector used when 
// calling function f() from g(), which causes the segmentation fault
struct MyType_t {
    DataType    Type;
    bool        isNull;
    std::string sVal;
    union {
        TriStateBool  bVal;
        unsigned int  uVal;  
        int           nVal;
        double        dVal;
    };

    // ctors
    explicit MyType_t(DataType type = DataType::UNKNOWN) :
        Type{type}, isNull{true}, dVal{0}
    { }

    explicit MyType_t(std::string sVal_) : 
        Type{DataType::VCHAR}, isNull{sVal.empty()}, sVal{sVal_}, dVal{0}
    { }

    explicit MyType_t(const char* psz) : 
        Type{DataType::VCHAR}, isNull{psz ? true : false}, sVal{psz}, dVal{0}
    { }

    explicit MyType_t(TriStateBool bVal_) : 
        Type{DataType::BOOL}, isNull{false}, bVal{bVal_} 
    { }

    explicit MyType_t(bool bVal_) : 
        Type{DataType::BOOL}, isNull{false}, bVal{bVal_ ? TriStateBool::crTRUE : TriStateBool::crFALSE} 
    { }

    MyType_t(double dVal_, DataType type, bool bAllowTruncate = false) {
        // sets data members in a switch-block, no function calls
        //...
    }
};

void f(std::vector<MyType_t> v) { /* intentionally empty */ }    // parameter by-val

void g(const std::vector<MyType_t>& v) {
    //...
    f(v);    // <-- this line raises segmentation fault!!! 
             // (works fine if parameter of f() is by-ref instead of by-val)
    //...
}

When I inspect the debugger, I see that it originates from the sVal data member of MyType_t.

The code above does not reproduce the problem, so I don't expect any specific answers. I do however appreciate suggestions to get me closer to finding the source of it.

Here is a bit of context: I have a logical expression parser. The expressions can contain arithmetical expressions, late-bind variables and function calls. Given such an expression, parser creates a tree with nodes of several types. Parse stage is always successful. I have the problem in evaluation stage.

Evaluation stage is destructive to the tree structure, so the its nodes are "restored" using a backup copy before each evaluation. 1st evaluation is also always successful. I'm having the problem in the following evaluation, with some certain expressions. Even then, the error is not consistent.

Please assume that I have no memory leaks and double releases. (I am using an global new() overload to track allocations/deallocations.)

Any ideas on how I should tackle this?

Kemal
  • 849
  • 5
  • 21
  • 2
    could you post your code? – user2393256 Jul 01 '16 at 12:08
  • 3
    The same way you tackle any problem: by constructing a [MCVE]. – Lightness Races in Orbit Jul 01 '16 at 12:12
  • 5
    _"I am unable to demonstrate the problem, so I don't expect any specific answers"_ Then I'm afraid you're in the wrong place :) – Lightness Races in Orbit Jul 01 '16 at 12:14
  • 3
    "Any ideas on how I should tackle this?" -- Debugging 101. **Machete debugging**: Cut away code not immediately contributing to the problem. Check if problem still persists. If it does, cut away more code. If last change made problem go away, look at that change for answers. If none are found, cut away *different* piece of code not contributing. Lather, rinse, repeat. **Debugger**. **Logging**, writing object data for cases that fail / don't fail to give you an idea *why* this is happening.... all of this not something a Q&A site can be much help with. – DevSolar Jul 01 '16 at 12:15
  • Gut feeling, I'd start with looking at the constructors / destructors of `some_type`... – DevSolar Jul 01 '16 at 12:18
  • @DevSolar: One of the problems is that, even inserting a `std::cout << "test";` in main() seems to trigger the error, which may also mean that has nothing to do with it at all. I've been "machete debugging" for the last 2 days. So frustrating... – Kemal Jul 01 '16 at 12:26
  • @user2393256: I can't localize the problem for now, so have no idea which part of the code I should post. – Kemal Jul 01 '16 at 12:30
  • 1
    You need to keep going on the path to a [MCVE]. – Lightness Races in Orbit Jul 01 '16 at 12:41
  • @LightnessRacesinOrbit: You are more than right to point that out. I wish I could do better myself. :-( – Kemal Jul 01 '16 at 12:42
  • 1
    Here's another reason to post the real code, preferably as reduced as poss: As written, this simply can't compile, for you are calling `f(v)` with the wrong type of argument, and C++ is a type-safe language. So how do you _really_ call it? And anyway, what does `f()` **do** with the value it receives? It might be taking a reference to its parameter and stashing that somewhere with a longer lifetime than the function and hence than a by-value argument (local var), so pass-by-value would lead to horror. But why bother speculating? You've given nothing real to work with for answering the question – underscore_d Jul 01 '16 at 13:21
  • @underscore_d: You are all right. I really wish I could find a reasonable way to do it. I am having trouble with deciding what part of the code I should post, as I have no idea what part of it is relevant to the problem. I posted a reduced version. As for what `f()` does in the actual code, it does nothing. It is an empty function, but triggers segmentation fault anyway. – Kemal Jul 01 '16 at 13:25
  • @underscore_d: I corrected the definition of `f()`, by the way, as you kindly pointed out. – Kemal Jul 01 '16 at 13:31
  • @Kemal OK, I wasn't sure whether you had just omitted to post the body of the function. If the body is empty, then the only thing the function does is to construct its argument in the by-value case. This indicates the problem might - unless it's a red herring caused by UB elsewhere - be in the constructor for `MyType_t` (btw, `_t` suffixes are discouraged as they're reserved by POSIX) and/or its interaction with `vector` allocation, copying, etc. Are you able freely to perform such operations with `vector` elsewhere, without any segfault? – underscore_d Jul 01 '16 at 14:00
  • @Kemal I do have to emphasise that eliding stuff around the call to `f()` is a very bad idea because all of that omitted code has the potential to generate UB and thus to make diagnosis of the included code futile. As is, there is no way to deal with this other than for you to post a minimal, compilable, replicable example for us - or to stay where you are, roll your sleeves up, and get the debugger running... again. I find it hard to believe that 2 days of debugging with all the stack traces, etc that accompany that has produced _nothing_, but if so, I understand your frustration! – underscore_d Jul 01 '16 at 14:03
  • @underscore_d: `vector` is used heavily as a return type in the program. Where it is used as a parameter, it is almost exclusively by-ref. Under those circumstances, I never received segfault due to that before. – Kemal Jul 01 '16 at 14:15
  • @Kemal Returning by value may not actually invoke the constructors you think it does, due to RVO or move semantics. And I thought the problem occurs when passing as an argument by value, so the fact that it works elsewhere using by-ref, like it does here, doesn't seem relevant? – underscore_d Jul 01 '16 at 14:17
  • @underscore_d: I don't have any tool other than GDB for debugging. Stack traces up to the declaration point of `MyType_t`, but generates segfault before it even enters any of its ctors. I will provide ctor definitions in a minute. Please believe in me when I say I am TRYING to duplicate the error in a short enough code that I can post here. :-) – Kemal Jul 01 '16 at 14:27
  • @Kemal Can you post all of the following: stack traces, elided content before and after the call to `f()`, body of the `switch` constructor, version and options of your compiler. And if you're still having no luck getting this down to an MCVE, consider posting the entire sources somewhere if they're not proprietary. – underscore_d Jul 02 '16 at 11:59
  • 1
    I didn't read all of the comments, but this line: *"The code above does not reproduce the problem, so I don't expect any specific answers."* is a serious issue with the question. Questions like this are off-topic on Stack Overflow. You need to come up with a way to reproduce the problem in order to limit the scope and quantity of possible answers. – Cody Gray - on strike Jul 02 '16 at 12:17
  • @CodyGray Yup. See how much time we've wasted because it's taken this long to get to a point where we can see the problem - per my answer - since the OP had assumed the actual code wasn't important(!) and only progressively drip-fed new pieces. This is a textbook example of why an MCVE should be provided right out of the gate. Or are we psychic now? – underscore_d Jul 02 '16 at 12:36
  • @CodyGray: Obviously, I do take full blame for my sloppy question. But even then, most of what has been written was helpful in one way or another. I appreciate the efforts of everyone who took their time to contribute. – Kemal Jul 03 '16 at 19:19

2 Answers2

2

f(v); // <-- this line raises segmentation fault!!!

There are only two ways this line could trigger SIGSEGV:

  1. You've run out of stack (this is somewhat common for deeply recursive procedures). You can do ulimit -s unlimited, and see if the problem goes away.
  2. You have corrupted v and so its copy constructor triggers SIGSEGV.

Please assume that I have no memory leaks and double releases.

There are many more ways to corrupt heap (e.g. by overflowing heap block) than double release. It is very likely that (if you do not have stack overflow) valgrind or AddressSanitizer will point you straight at the problem.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • I have never used the tools you mentioned, and I will. I just would like to add that I am using GDB for debugging. Stack traces up to the declaration point of MyType_t, but generates segfault before it even enters any of its ctors. Also, stack is about 20 levels deep just before I get the error. – Kemal Jul 02 '16 at 01:32
  • 1
    @Kemal You should edit your question with the GDB stack trace and output from `x/i` and `info registers`. "I have never used tools" -- be prepared to find several bugs then. These tools are *indispensable*, you should use them often. – Employed Russian Jul 02 '16 at 06:49
  • I have a windows system, so it seems the way ahead is AddressSanitizer for now. Being a stranger to it, I found that setting it up is not that straightforward. I'll manage somehow, I think. Upside is that, shifting my focus helped, and I found the error: a ByRef capture of `v` in a **static** lambda. – Kemal Jul 03 '16 at 19:31
  • 1
    @Kemal Are you concerned about typically cited Linux tools not being available on Windows? If so, I strongly recommend trying MSYS2. It's a tracking port of Cygwin for precisely this purpose: to provide Linux/POSIX tools to build and debug native (unlike Cygwin) Win32 programs. I develop on Linux but use MSYS2 to test on Windows. It's great. In addition to providing native-linking ports of MinGW a.k.a. GCC targeting both 32 and 64-bit editions, it comes with the Arch Linux package manager `pacman` - whose repository includes `valgrind` et al and many other useful things (including GTKmm, wooo) – underscore_d Jul 03 '16 at 19:37
2

When I inspect the debugger, I see that it [the segfault] originates from the sVal data member of MyType_t.

Well, now that you've actually added most of the relevant code, we see why asking for help with no/insufficient code, because you just know the rest is A-OK, is a terrible idea:

Problem 1

explicit MyType_t(std::string sVal_) : 
    Type{DataType::VCHAR}, isNull{sVal.empty()}, sVal{sVal_}, dVal{0}
{ }

AWOOGA! Your initialiser for isNull is checking whether the member, sVal, is empty(), before you have initialised sVal from the parameter, sVal_. (N.B. Although they're the same in this case, the real order is that of declaration, not of initialisation.) You might think isNull will always be set true as it's always checking the initially default-constructed, empty member sVal, rather than the input sVal_...

But it's worse than that! The member isn't default-constructed, because the compiler knows that's pointless as it's about to be copy-constructed. So, at the point of calling empty(), you are acting on an uninitialised variable, and that's undefined behaviour.

(Aside: This vindicates my preferred naming convention, m_sVal. It's much more difficult to accidentally type 2 characters than to forget a trailing underscore. But if anything, people normally include the trailing underscore in the member, not the argument. Anyway!)

But wait!

Problem 2

While your first constructor for the VCHAR variant of MyType_t guarantees UB, your second strongly invites it:

explicit MyType_t(const char* psz) : 
    Type{DataType::VCHAR}, isNull{psz ? true : false}, sVal{psz}, dVal{0}
{ }

In this version, you allow the possibility that sVal will be constructed from a null pointer. But let's have a look at http://en.cppreference.com/w/cpp/string/basic_string/basic_string - where the std::string constructors taking char const *, (4) and (5) with the latter being the one called here, are annotated as follows:

5) Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s. The length of the string is determined by the first null character. The behavior is undefined if s does not point at an array of at least Traits::length(s)+1 elements of CharT, including the case when s is a null pointer.

This is formalised in the C++11 Standard at 21.4.2.9: https://stackoverflow.com/a/10771938/2757035

It's worth noting that GCC's libstdc++ as used by g++ does throw an exception when passed a nullptr: "what(): basic_string::_S_construct null not valid". I might infer from your mention of gdb that you're using g++ - in which case, this probably isn't your problem, unless you're using an older version than me. Still, the mere possibility should still be avoided. So, guard your initialisation as follows:

sVal{psz ? psz : ""}

Result

From this point on, due to UB, anything can happen - at any point in your program. But a common 'default behaviour' for UB is a segfault.

With the constructor taking std::string, even if the invalid call to .empty() completes, isNull won't just get a 'random' value based on the uninitialised .size(): rather, because it's constructed from UB, isNull is undefined, and tests of it might be removed or inlined as some arbitrary constant, potentially leading to wrong code paths being taken, or right ones not. So although sVal gets a valid value eventually, isNull doesn't.

With the constructor taking char const *, your member std::string sVal itself will be followed everywhere by UB if the passed pointer is null. If so, isNull would be OK, but any attempts to use sVal would be undefined.

In both cases, any number of unknowable things might happen, because UB is involved. If you want to know exactly what's happened in this case, disassemble your program.

The reason for the segfault not occurring when the vector is passed by reference is nebulous and of little discursive value; in either case, you're reading or copy-constructing from MyType_t elements whose construction involved UB in one way or another.

Solution

The point is that you need to fix both of these erroneous, UB-generating pieces of code and determine whether it resolves the error. It's very likely that it will because what the first and possibly second constructor are currently doing is so UB that, in a practical sense, your program nearly guaranteed to crash somewhere.

You must now comb over your program for any such coding errors and eliminate every single one, or they will catch you out eventually, "on a long enough timeline".

Community
  • 1
  • 1
underscore_d
  • 6,309
  • 3
  • 38
  • 64
  • **P.S.** Sadly I couldn't find a way to make `g++` warn about this. It seems only to offer warnings when the order of initialisation differs from that of declaration, not when ordering dependencies are broken; the two are orthogonal. It also doesn't seem to realise `sVal` is used uninitialised. `-Wall -Wextra -Wpedantic` does nothing. Does any reader know of a way we could make it warn about such code? – underscore_d Jul 02 '16 at 13:11
  • Thanks for pointing out so many helpful points with what little information I was able to provide. BTW, after 3 days wasted, I finally caught the culprit: `v` captured **by reference** by a **static** lambda. >:-\ – Kemal Jul 03 '16 at 19:19
  • @Kemal Wow. A static lambda? There's something I've never even imagined using. Can I ask what you use it for? Anyway, so when passing by value, said lambda would quickly end up working with a dangling reference, and hello UB. But when passing in by reference, you'd get away with it until such time as the referred object was destroyed, which was presumably _long enough_ (if not necessarily a good idea!). Thanks for letting me know, and I hope the experience and my post reduce the risk of you going through similar baffling things in the future...aaand show the need to post complete questions ;-) – underscore_d Jul 03 '16 at 19:31
  • _"Can I ask what you use it for?"_ It appears as a variable in a function called very frequently during expression evaluation. I think the reason behind declaring it static was that I "reasoned" it would be wasteful to have the lambda object constructed at every function call. Not that I tested it, or anything. – Kemal Jul 03 '16 at 19:47
  • @Kemal Your reasoning might've been right. Sadly, even with that, there are so many ways we can catch ourselves out :-) – underscore_d Jul 03 '16 at 19:48