1

So I am currently trying to write a fairly simple program that takes a file as input, reads it, and lexes it (lexical analysis) (I will get to parsing and interpreting later)

I have written similar programs before that worked perfectly, but surprisingly, this one hangs when I add a function at the end.

The following works perfectly:

#include <stdio.h>
#include <stdlib.h>
#include "include/slex.h"

int main(void) {
    // Read the file and save it into source.
    printf("start\n");
    FILE* fp = fopen("test.sl", "r");
    printf("fileopen\n");
    fseek(fp, 0, SEEK_END);
    printf("seek\n");
    char* source = malloc((ftell(fp) + 1) * sizeof (char)); // +1 because of the null terminator.
    printf("allocation\n");
    fseek(fp, 0, SEEK_SET);
    char c;
    int i = 0;
    while ((c = fgetc(fp)) != EOF) {
        source[i++] = c;
    } // Iterate through every single character in the file and store it into source.
    
    source[i] = '\0';
    fclose(fp);
    // Now lex the file;
    printf("Lex\n");
    lexer_t lexer;
    printf("lex2\n");
    lexer_init(&lexer, source);
    printf("lex3\n");
    /*
    lex(&lexer);
    printf("lex4");
    tl_print(&lexer.tokens);
    */
}

But when I uncomment (hope that is an actual word) lex(&lexer), it just hangs. It does not print the previous statements.

The function lex is defined in slex.c and slex.c includes slex.h.

I compiled it with gcc -Wall Wextra -o sl main.c slex.c, and it does not give me any warning, nor any error.

void lex(lexer_t* lexer):

void lex(lexer_t* lexer) {
    printf("lex"); // Debugging
    // Some people would call this function "make_tokens", I will call it "lex".
    while (lexer->current != '\0') {
        if (lexer->current == '\n') {
            token_t token = {.type = NEWLINE};
            tl_append(&lexer->tokens, token);
        }

        if (isdigit(lexer->current)) {
            token_t token = {.type = INT, .int_value = lex_integer(lexer)};
            tl_append(&lexer->tokens, token);
        }

        else if (isalpha(lexer->current)) {
            token_t token = {.type = ID, .id_value = lex_identifier(lexer)};
            tl_append(&lexer->tokens, token);
        }
    }
}

I hope someone finds a solution to my problem, because I do not understand it. Have a nice day and thank you.

And do not hesitate to ask if you need more information, just comment it and I will edit my question.

ademyro
  • 53
  • 7
  • 1
    stdout is buffered. If you add `fflush(stdout)` before you call `lex()`, you should see all of your output (followed by the application hanging). – Bill Lynch Sep 10 '22 at 16:15
  • 1
    Also, does your `lex()` function ever modify `lexer->current`? It doesn't seem like it does. Which would explain why that function never completes. – Bill Lynch Sep 10 '22 at 16:17
  • 1
    This sounds like the perfect time to learn how to [*debug*](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) your programs. More specifically how to use a [*debugger*](https://stackoverflow.com/questions/25385173/what-is-a-debugger-and-how-can-it-help-me-diagnose-problems) to step through your code statement by statement while monitoring variables and their values. – Some programmer dude Sep 10 '22 at 16:19
  • @BillLynch Adding fflush solved it! And yes, lex_integer() and lex_identifier() all modify lexer->current. I will debug it. But my main problem was just my printf statements not showing up. Thanks. – ademyro Sep 10 '22 at 16:20
  • @Someprogrammerdude I will debug it, thanks everyone for your advice. – ademyro Sep 10 '22 at 16:21
  • 2
    If `lexer->current` is a newline (or non-digit/non-alpha), `lex` will loop forever – Craig Estey Sep 10 '22 at 16:25
  • 1
    Time to _fill that gap_: EOF is an integer value. `c` has been declared as `char`... `fgetc( )` returns an `int`... Do you see where this goes? – Fe2O3 Sep 10 '22 at 20:51
  • @Fe2O3 I am not sure to understand what you are trying to say, but I guess that means that I am assigning an integer to a char, and then comparing that integer in a char with an integer value? – ademyro Sep 10 '22 at 21:00
  • To reliably determine if EOF has been returned from `fgetc( )`, the variable must be declared as an `int`... Those 256 different byte values that are **not** EOF can be relied upon to fit into a `char`... Demoting a possible return of EOF to a char before it is tested is not reliable... Simply change the declaration of `c` from `char` to `int`... (Better yet, forget the loop and use `fread( )`. AND start checking return values for all system call functions... open can fail. malloc can fail... and so on... – Fe2O3 Sep 10 '22 at 21:05

2 Answers2

1

As Bill Lynch said, Adding fflush(stdout); before lex(&lexer); solved my issue. Thanks to everyone who came by this question, really appreciate your help. I wish you all a nice day.

ademyro
  • 53
  • 7
1

Credit @Bill Lynch (first comment) who noted "app will [likely] hang" with infinite loop.

It's one thing to printf() to confirm operation. It's something else to check return codes and exit with helpful message.

Below is a rewritten (untested) version of your code using short name aliases to enhance clarity of what is being performed at different locations.

#include <stdio.h>
#include <stdlib.h>
#include "include/slex.h"

void lex( lexer_t *l ) {
    fprintf( stderr, "Entered lex()\n" ); // debug

    char c;
    while( ( c = l->current ) != '\0' ) {
        token_t t = { 0 };

        if( c == '\n') t.type = NEWLINE;
            // still infinite loop...
            // what "advances" the pointer?? Credit @Craig Estey

        else if( isdigit( c ) ) t.type = INT, t.int_value = lex_integer( l );

        else if( isalpha( c) ) t.type = ID, t.id_value = lex_identifier( l );

        else { // equivalent to "default:" in a "switch"
            fprintf( stderr, "Un-lex-able char ('%c') encountered\n", c );
            exit( EXIT_FAILURE );
        }

        tl_append( &l->tokens, t );
    }
}

int main() {
    char *fname = "test.sl";

    FILE* fp = fopen( fname, "r" );
    if( fp == NULL ) {
        fprintf( stderr, "Failed to open %s\n", fname );
        exit( EXIT_FAILURE );
    }

    fseek( fp, 0, SEEK_END );
    size_t size = ftell( fp );
    fseek(fp, 0, SEEK_SET);

    fprintf( stderr, "Size %zu\n", size ); // debug

    char *buf = malloc( (size + 1) * sizeof *buf ); // +1 because of the null terminator.
    if( buf == NULL ) {
        fprintf( stderr, "Failed to alloc block %zu\n", size + 1 );
        exit( EXIT_FAILURE );
    }

    size_t nread = fread( buf, sizeof buf[0], size, fp );
    if( nread != size ) {
        fprintf( stderr, "Expected %zu. Read %zu\n", size. nread );
        exit( EXIT_FAILURE );
    }
    fclose( fp );
    buf[ nread ] = '\0';

    lexer_t lexer;
    lexer_init( &lexer, buf );

    lex( &lexer );

    // free( buf ); // Unsure if you refer back to original..

    tl_print( &lexer.tokens );

    return 0;
}

You can add as many optimistic "making progress" printf calls as you'd like. It's the pessimistic "not working" print statements that are needed to push forward.

Fe2O3
  • 6,077
  • 2
  • 4
  • 20
  • Thank you, I will use ```fprintf``` on ```stderr``` to debug from now on. And this version is really a lot more readable than mine! What advances the pointer are the lex_integer and lex_identifier functions, but I already fixed the infinite loop. Thanks for the advice! – ademyro Sep 11 '22 at 08:59
  • Yeah... good advice... worth every penny you paid... – Fe2O3 Sep 11 '22 at 09:07