-3

I am currently working on creating a small compiler using C++. I have defined the following objects:

struct ValueNode
{
    std::string name;
    int value;
};

struct StatementNode
{
    StatementType type;

    union
    {
        struct AssignmentStatement * assign_stmt;
        struct PrintStatement * print_stmt;
        struct IfStatement * if_stmt;
        struct GotoStatement * goto_stmt;
    };

    struct StatementNode * next; // next statement in the list or NULL
};

I have defined a series of functions relating to different types of statements in the language. One of these functions is called parse_assignment_stmt(). The segmentation fault I am experiencing is happening in this function, immediately after attempting to assign a value to recently-allocated memory. Here is that function:

struct StatementNode* parse_assign_stmt() {

    //Object to be returned. Holds an object representing a statement
    //made within the input program.
    struct StatementNode* st = (struct StatementNode*)malloc(sizeof(struct StatementNode));

    st->type = ASSIGN_STMT;

    //First token should be an ID. Represents memory location we are assigning to.
    Token tok = lexer->GetToken(); 

    if(tok.token_type == ID) {

        //Second token in an assignment should be an equal sign
        Token tok2 = lexer->GetToken();

        if (tok2.token_type == EQUAL) {

            //This function reads the next token, makes sure it is of type NUM or ID, then creates and returns a ValueNode containing the relevant value.
            struct ValueNode* rhs1 = parse_primary();

            Token tok3 = lexer->GetToken();

            //Assignment format for this logical branch: "x = 5;"
            if(tok3.token_type == SEMICOLON) {
                //first type

                //Allocate memory for objects needed to build StatementNode st
                struct AssignmentStatement* assign_stmt = (struct AssignmentStatement*)malloc(sizeof(struct AssignmentStatement));

                struct ValueNode* lhs = (struct ValueNode*)malloc( sizeof(struct ValueNode));

                printf("Name: %s, Value: %d\n", lhs->name.c_str(), lhs->value);


                //PROBLEM ARISES HERE***
                //lhs->name = tok.lexeme;               

                //return the proper structure
                return st;
            }
            else if(tok3.token_type == PLUS || tok3.token_type == MINUS || tok3.token_type == DIV || tok3.token_type == MULT) {
                //second type

                //TODO
            }
            else {
                printf("Syntax error. Semicolon or operator expected after first primary on RHS of assignment.");
                exit(1);
            }
        }
        else {
            //not of proper form
            printf("Syntax error. EQUAL expected after LHS of assignment.");

            exit(1);
        }
    }
    else {
        //Not of proper form. Syntax error
        printf("Syntax error. ID expected at beginning of assignment.");

        exit(1);
    }
}

Essentially, I'm allocating memory for a new ValueNode to create the variable lhs. I am printing out the name and value fields immediately to ensure that there isn't anything present. In my compiler output (I'm using g++, by the way), it's telling me that the name is (null) and the value is 0, which is expected. As soon as I uncomment the line

lhs->name = tok.lexeme;

I get a segmentation fault. At this point, I have no idea what could be going wrong. I'm creating the variable, using malloc to allocate memory to the location, making sure that there isn't anything stored there, and then immediately trying to write a value. And it always gives me a segmentation fault.

Here is the input program (.txt file) that is being fed to the program through stdin.

i;
{
    i = 42 ; 
    print i;
}

I have tried using calloc() instead, since that should make sure that the memory is cleared before returning the pointer, but that didn't change anything. Any suggestions would be wonderful. Thank you!

  • 5
    Do not use `malloc()` with c++ code. – user0042 Dec 01 '17 at 00:19
  • 2
    You might want to consider replacing your unioned `struct` with a proper inheritance hierarchy. Then you won't be thought of as a C+ programmer, that strange breed who never quite make the transition fully from C to C++ :-) – paxdiablo Dec 01 '17 at 00:26
  • [This question](https://stackoverflow.com/questions/46991224/are-there-any-valid-use-cases-to-use-new-and-delete-raw-pointers-or-c-style-arr) also gives you some hints how, and why to drop _manual dynamic memory allocation_ for most cases in c++ completely. – user0042 Dec 01 '17 at 01:17

2 Answers2

0

If the problem arises in the line:

lhs->name = tok.lexeme; 

then I'd warrant the problem lies with either lhs or tok.lexeme.

Since, prior to that, you appear to have confirmed that lhs is okay with:

printf("Name: %s, Value: %d\n", lhs->name.c_str(), lhs->value);

then the chances that it's an issue with the token structure skyrocket.

However, we shouldn't need to surmise, you should be able to load up the code into a good debugger (or even gdb, in a pinch(a)), set a breakpoint at the offending line, and actually look at the variables to see if they are as expected. Or, at a bare minimum, print out more stuff before trying to use it.


Aside: It's always been a bugbear of mine that the first course taught at university isn't Debugging 101. That's the first thing I taught my son once he started doing Python development.


(a) Pax ducks for cover :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Aside^2: depends on university I guess. The first thing they showed us after malloc and free was valgrind. And they showed us gdb before that too, but that one is too scary for most people – Ordoshsen Dec 01 '17 at 00:45
  • Hi Pax. I also wish that debugging were a more thoroughly-covered topic in my program. I appreciate the suggestion of using a breakpoint. Are there any specific resources you'd recommend that I read through in order to gain a better understanding of debugging? – Daniel Regan Dec 01 '17 at 01:25
  • Hey, Pax, I ran the program through gdb, and found out some new information! I added a breakpoint directly before this line: lhs->name = tok.lexeme; At that point in time, the two offending structures appear to be rhs1 and lhs. When running "p *rhs1" in gdb, I get: {name = , value = 42} I get something nearly identical for "p *lhs", with a different memory address being inaccessible, and the value being 0. Both values are as expected, but for some reason the memory for both names unreadable. Any ideas? – Daniel Regan Dec 01 '17 at 02:44
  • Don't know about `lhs` since you haven't given the pointer value, but `0xfff...fe8` (-24) does *not* look like a valid pointer to me. Suggest you examine `parse_primary()` to see if/why it's returning that. But at least you're well on your way re debugging :-) – paxdiablo Dec 01 '17 at 04:49
0

After further investigation, I found that when allocating memory (using malloc()) for my ValueNode objects (and for some reason, only these ones), malloc() was returning a pointer to inaccessible memory. The error I received in gdb when trying to print my ValueNode structure was:

{name = <'error reading variable: Cannot access memory at address 0xfffffffe8>, value = 42}

Unfortunately, I was not able to find a way to allocate the memory for this object using malloc(). A workaround that I managed to make happen, however, was to create a constructor within the structure definition of ValueNode, then use "new" to create the object and allocate the memory for me, rather than trying to force malloc() to work. In retrospect, I probably should have used this simpler approach over malloc() to begin with.

  • If you for some reason need to use malloc with C++, google "placement new", which allows you you to create objects on already allocated memory. – hyde Dec 01 '17 at 17:29