-1

I am having problems with my strtok_r() implementation. I am a parsing a text file such that if it comes across ";" it treats it as a comment and ignores it, parsing the tokens (anything separated by white space) in the file.

Here is such a file:

1) ;;
2) ;; Basic 
3) ;;
4) 
5) defun main
6) 5 3 2 * + printnum endl      ;;  (3 * 2) + 5 = 11
7) 3 4 5 rot * + printnum endl  ;;  (3 * 5) + 4 = 19
8) return

What I am doing is that once I fgets() a line, I parse the line using strtok_r(). Here is the complete function that attempts this:

void read_token(token* theToken, char* j_file, char* asm_file)
{
    //Declare and initialize variables
    int len;
    char line[1000];
    char *semi_token = NULL;
    char* parse_tok = NULL;
    char* assign = NULL;

    //Open file to begin parsing
    FILE *IN = fopen(j_file, "r");

    //If file pointer NULL
    if (IN == NULL)
    {
        //Print error message
        printf("error: file does not exist\n");

        //Terminate program
        exit(1);
    }

    //File pointer not NULL
    else
    {
        //Initialize char_token linked list
        parsed_element* head = init_list_head();
        head->token = "start";

        print_list(head);

        //Get characters from .j FILE
        while (!feof(IN))
        {
            //Get each line of .j file
            fgets(line, 1000, IN);

            //Compute length of each line
            len = strlen(line);

            //If length is zero or if there is newline escape sequnce
            if (len > 0 && line[len-1] == '\n')
            {
                //Replace with null
                line[len-1] = '\0';
            }

            //Search for semicolons in .J FILE
            semi_token = strpbrk(line, ";\r\n\t");

            //Replace with null terminator
            if (semi_token) 
            {
                *semi_token = '\0';
            }
            // printf("line is %s\n",line );

            //Copy each line
            assign = line;

            // printf("line is %s\n",line );

            len = strlen(line);

            printf("line length is %d\n",len );

            // parse_tok = strtok(line, "\r ");

            //Parse each token in line
            while((parse_tok = strtok_r(assign, " ", &assign)))
            {
                printf("token is %s\n", parse_tok);

                insert_head(&head, parse_tok);

                print_list(head);       

                //Obtain lentgh of token
                // len = strlen(parse_tok);

                // printf("len is %d \n", len);

            }

        }    

    }
}

I am loading each token into a singly linked list. Here is the struct that makes up each node of the list:

typedef struct parsed_element
{
    char* token;
    struct parsed_element* next;
} parsed_element;

Aspects that are working as expected

1) My function is properly delimiting each line from fgets() after removing all ";" and space delimiters. Here is the output as proof:

1) line length is 0
2) line length is 0
3) line length is 0
4) line length is 0
5) line length is 10
6) line length is 23
7) line length is 27
8) line length is 6

2) My function is properly tokenizing each line. Here is the output to confirm this:

token is defun
token is main
token is 5
token is 3
token is 2
token is *
token is +
token is printnum
token is endl
token is 3
token is 4
token is 5
token is rot
token is *
token is +
token is printnum
token is endl
token is return

Aspects NOT working as expected

1) The problem comes when I try to insert each token into a singly-linked list. After I obtain each token, I pass the token into a function that inserts it at the head of an already initialized linked list. The expected behavior after every iteration in the while loop containing strtok_r()is:

1) List is: Start
2) List is defun Start
3) List is main defun Start
4) List is: 5 main defun Start
5) List is: 3 5 main defun Start
6) List is: 2 3 5 main defun Start
7) List is: * 2 3 5 main defun Start
8) List is: + * 2 3 5 main defun Start
9) List is: printnum + * 2 3 5 main defun Start
10) List is: endl printnum + * 2 3 5 main defun Start
11) List is: 3 endl printnum + * 2 3 5 main defun Start
12) List is: 4 3 endl printnum + * 2 3 5 main defun Start
13) List is: 5 4 3 endl printnum + * 2 3 5 main defun Start
14) List is: rot 5 4 3 endl printnum + * 2 3 5 main defun Start
14) List is: * rot 5 4 3 endl printnum + * 2 3 5 main defun Start
16) List is: + * rot 5 4 3 endl printnum + * 2 3 5 main defun Start
17) List is: printnum + * rot 5 4 3 endl printnum + * 2 3 5 main defun Start
18) List is: endl printnum + * rot 5 4 3 endl printnum + * 2 3 5 main defun Start
19) List is: return endl printnum + * rot 5 4 3 endl printnum + * 2 3 5 main defun Start

Instead this is what I observe:

1) List is: start 
2) List is: defun start 
3) List is: main defun start 
4) List is: 5 * + printnum endl 5 start 
5) List is: 3 5 * + printnum endl 5 start 
6) List is: 2 3 5 * + printnum endl 5 start 
7) List is: * 2 3 5 * 5 start 
8) List is: + * 2 3 5 * 5 start 
9) List is: printnum + * 2 3 5 * 5 start 
10) List is: endl printnum + * 2 3 5 * 5 start 
11) List is: 3 num endl * + printnum endl t * + printnum endl rot * + printnum endl 5 rot * + printnum endl 4 5 rot * + printnum endl 3 rot * + printnum endl 3 start 
12) List is: 4 3 num endl * + printnum endl t * + printnum endl rot * + printnum endl 5 rot * + printnum endl 4 3 rot * + printnum endl 3 start 
13) List is: 5 4 3 num endl * + printnum endl t * + printnum endl rot * + printnum endl 5 4 3 rot * + printnum endl 3 start 
14) List is: rot 5 4 3 num endl * + printnum endl t rot 5 4 3 rot 3 start 
15) List is: * rot 5 4 3 num endl * t rot 5 4 3 rot 3 start 
16) List is: + * rot 5 4 3 num endl * t rot 5 4 3 rot 3 start 
17) List is: printnum + * rot 5 4 3 num * t rot 5 4 3 rot 3 start 
18) List is: endl printnum + * rot 5 4 3 num * t rot 5 4 3 rot 3 start 
19) List is: return endl printnum + *  rn turn return num * t  rn turn return  return start 

After the third iteration, my insert head function fails and does not insert each token at the head of the list. In fact, it's somehow breaking down my tokens. Why would this be happening? I'm pretty sure it's not the implementation of my linked list insert_head() and print_list() functions.

Those have been rigorously tested and proven to work for other applications. My feeling is that it has something to do with the way I'm parsing each token? Or the way those utilities are interacting?

I am posting my the code for my insert_head() print_list() functions for reference:

LIST_STATUS insert_head(struct parsed_element** head, char* token);
void print_list(struct parsed_element* head);

LIST_STATUS insert_head(struct parsed_element** head, char* token)
{
    //Check if parsed_element** head returns NULL
    if (!*head)
    {
        //Return status
        return LIST_HEAD_NULL;
    }

    //Case where head is not NULL
    else
    {
        //Create new node
        parsed_element* new_node;

        //Malloc space for new node
        new_node = (parsed_element*)malloc(sizeof(parsed_element));

        //Case where malloc returns void*
        if (new_node != NULL)
        {
            //Set tokenue of new node
            new_node->token = token;

            //Point new node to address of head
            new_node->next = *head;

            //New node is now head node (CHECK FOR POTENTIAL ERRORS)
            *head = new_node;

            //Return status
            return LIST_OKAY;
        }

        //Case where malloc returns NULL
        else 
        {
            //Print malloc error
            printf("Malloc error: aborting\n");

            exit(0);
        }
    }   
}

void print_list(struct parsed_element* head)
{
    //Create variable to store head pointer
    parsed_element* print_node = head;

    //Print statement
    printf("List is: ");

    //Traverse list
    while (print_node != NULL)
    {
        //Print list element
        printf("%s ",print_node->token);

        //Increment pointer
        print_node = print_node->next;
    }

    //Print newline
    printf("\n");
}
halfer
  • 19,824
  • 17
  • 99
  • 186
J_code
  • 356
  • 1
  • 5
  • 17
  • 3
    Please see [Why is `while ( !feof (file) )` always wrong?](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) The idiomatic loop is like `while(fgets(line, 1000, IN) != NULL) { ... }` – Weather Vane Jun 24 '18 at 17:25
  • @WeatherVane I believe I have seen this thread before, but changing it has no effect on the specific problem I'm having. – J_code Jun 24 '18 at 17:35
  • @user3121023 can you clarify the implementation procedure? And also, I'm not sure I understand what you mean by "the earlier pointers still point to line". Line is being updated with every iteration of the loop. – J_code Jun 24 '18 at 17:46
  • 1
    ... and its content changes, so the previous pointers no longer point to anything useful. – Weather Vane Jun 24 '18 at 17:47
  • 1
    If you don't have `strdup` (it is non-standard) you can make your own function by using `malloc` (with `strlen()` + 1!) and `strcpy`. – Weather Vane Jun 24 '18 at 17:51

1 Answers1

0

Your function read_token uses a local variable line for reading file contents. When tokenizing this line using strtok, you will receive pointers into the memory allocated for that local variable. Passing such pointers then to another function insert_head, which simply assigns the pointer (but does not copy the content), will lead to list nodes that point to "invalid" memory as soon as read_token has ended (line will become invalid when going out of scope, i.e. when read_token ends).

So instead of

new_node->token = token;

you need to copy the token`s content, i.e. write

new_node->token = malloc(strlen(token)+1);
strcpy(new_node->token,token);
Stephan Lechner
  • 34,891
  • 4
  • 35
  • 58
  • This tricky because the file that contains the linked list implementation is inside a function in another file that is linked via a Makefile. I raise this point because presumably I would need to free `new_node->token` somehow? How can I do this when the file that invokes the `insert_head()` function is calling it in a loop in another file? This is an issue because it is causing memory leaks. – J_code Jul 25 '18 at 21:28