1

I'm working on an assembler for a hypothetical machine (the SMAC-0 machine) and need some help with memory allocation.

I'll be getting and tokenizing strings from a given file and will save these tokens in pointers.

Here's a code snippet:

tokenCount = sscanf(buffer,"%s %s %s %s", tokenOne, tokenTwo, tokenThree, tokenFour);

where tokenCount is an integer, buffer is the temporary buffer that stores the line taken from the input file, and tokenOne, tokenTwo, tokenThree, and tokenFour are character pointers.

The strings accepted from the file can have one to four words:

Example:


            READ    N
    N:      DS      1
    SUM:    DS      1
    LOOP:   MOVER   AREG    N
            ADD     AREG    N
            COMP    AREG    ='5'
            BC      LE      LOOP
            MOVEM   AREG    SUM
            PRINT   SUM
            STOP

My queries are:

  • How can I find out how large the token is and thus know how to allocate memory for the respective token pointer?
  • (That question also applies to the buffer pointer, since the labels (e.g. LOOP, N, SUM) can be of variable sizes.)

  • How can I, using scanf() or other input functions like gets(), do the same?
  • 1 Answers1

    1

    You should declare your token buffers large enough. To be on the safe side, it's a good idea to make all of them as large as the input buffer itself. See this this thread How to prevent scanf causing a buffer overflow in C? for more information.

    If you're using the GNU compiler, you can make use a extension which can dynamically allocate buffers on your behalf. Check out Dynamic allocation with scanf()

    EXAMPLES:

    Using predefined buffers for the scanned tokens:

    Note all tokens have the same size as the input buffer:

    /* sscanf-test.c */
    #include <stdio.h>
    
    int main(int argc, char** argv)
    {
      FILE *file = fopen("sample.txt", "r");
      const int BufferSize=256;
      char buffer[BufferSize];
      char tokenOne[BufferSize];
      char tokenTwo[BufferSize];
      char tokenThree[BufferSize];
      char tokenFour[BufferSize];
    
      while (fgets(buffer, sizeof(buffer), file) != NULL)
      {
        tokenOne[0]='\0';
        tokenTwo[0]='\0';
        tokenThree[0]='\0';
        tokenFour[0]='\0';
        int tokenCount = sscanf(buffer, "%s %s %s %s", tokenOne, tokenTwo, tokenThree, tokenFour);
        printf("scanned %d tokens   1:%s 2:%s 3:%s 4:%s\n", tokenCount, tokenOne, tokenTwo, tokenThree, tokenFour);
      }
    
      fclose(file);
      return 0;
    }
    

    The program produces the following output (I cleaned up the formatting a little bit to improve readability):

    gcc sscanf-test.c -o sscanf-test
    ./sscanf-test 
    scanned 2 tokens   1:READ   2:N    3:     4: 
    scanned 3 tokens   1:N:    2:DS    3:1    4: 
    scanned 3 tokens   1:SUM:  2:DS    3:1    4: 
    scanned 4 tokens   1:LOOP: 2:MOVER 3:AREG 4:N 
    scanned 3 tokens   1:ADD   2:AREG  3:N    4: 
    scanned 3 tokens   1:COMP  2:AREG  3:='5' 4: 
    scanned 3 tokens   1:BC    2:LE    3:LOOP 4: 
    scanned 3 tokens   1:MOVEM 2:AREG  3:SUM  4: 
    scanned 2 tokens   1:PRINT 2:SUM   3:     4: 
    scanned 1 tokens   1:STOP  2:      3:     4:
    

    If you want to store the scanned tokens for later processing, you'll have to copy them somewhere else in the while-loop. You can use the function strlen to get the size of the token (excluding the trailing string terminator '\0').

    Using dynamic memory allocation for tokens:

    Like I said, you could also let scanf allocate buffers for you dynamically. The scanf(3) man page states that you can use GNU extensions 'a' or 'm' to do that. Specifically it says:

    An optional 'a' character. This is used with string conversions, and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required. This is a GNU extension; C99 employs the 'a' character as a conversion specifier (and it can also be used as such in the GNU implementation)

    I couldn't get scanf to work using the 'a' modifier. However, there's also the 'm' modifier which does the same thing (and more):

    Since version 2.7, glibc also provides the m modifier for the same purpose as the a modifier. The m modifier has the following advantages:

    • It may also be applied to %c conversion specifiers (e.g., %3mc).

    • It avoids ambiguity with respect to the %a floating-point conversion specifier (and is unaffected by gcc -std=c99 etc.)

    • It is specified in the upcoming revision of the POSIX.1 standard.

    /* sscanf-alloc.c */
    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char **argv)
    {
      FILE *file = fopen("sample.txt", "r");
      const int BufferSize=64;
      char buffer[BufferSize];
      char *tokenOne   = NULL;
      char *tokenTwo   = NULL;
      char *tokenThree = NULL;
      char *tokenFour  = NULL;
    
      while (fgets(buffer, sizeof(buffer), file) != NULL)
      {
        // note: the '&', scanf requires pointers to pointer to allocate the buffers.
        int tokenCount = sscanf(buffer, "%ms %ms %ms %ms", &tokenOne, &tokenTwo, &tokenThree, &tokenFour);
        printf("scanned %d tokens   1:%s 2:%s 3:%s 4:%s\n", tokenCount, tokenOne, tokenTwo, tokenThree, tokenFour);
    
        // note: the memory has to be free'd to avoid leaks
        free(tokenOne);
        free(tokenTwo);
        free(tokenThree);
        free(tokenFour);
        tokenOne   = NULL;
        tokenTwo   = NULL;
        tokenThree = NULL;
        tokenFour  = NULL;
      }
    
      fclose(file);
      return 0;
    }
    
    gcc sscanf-alloc.c -o sscanf-alloc
    ./sscanf-alloc
    scanned 2 tokens   1:READ  2:N      3:(null) 4:(null)
    scanned 3 tokens   1:N:    2:DS     3:1      4:(null)
    scanned 3 tokens   1:SUM:  2:DS     3:1      4:(null)
    scanned 4 tokens   1:LOOP: 2:MOVER  3:AREG   4:N
    scanned 3 tokens   1:ADD   2:AREG   3:N      4:(null)
    scanned 3 tokens   1:COMP  2:AREG   3:='5'   4:(null)
    scanned 3 tokens   1:BC    2:LE     3:LOOP   4:(null)
    scanned 3 tokens   1:MOVEM 2:AREG   3:SUM    4:(null)
    scanned 2 tokens   1:PRINT 2:SUM    3:(null) 4:(null)
    scanned 1 tokens   1:STOP  2:(null) 3:(null) 4:(null)
    
    Community
    • 1
    • 1
    djf
    • 6,592
    • 6
    • 44
    • 62