2

While developing a small program to scan lines of English words for key data items, I selected sscanf() to parse the line. Since an unknown number of words exist on each line, sscanf() must be called with the maximum number of possible fields specified in the request. This results in a long and ugly single line statement. A cleaner technique would to used sscanf() to obtain one word at a time in a programmed loop. Unfortunately, it's not possible to know how many spaces sscanf() skipped over to obtain the next field. Thus it's impossible to call sscanf() again with a string pointer that reflects the exact spot where sscanf() left off on the previous call. Code example follows. Two questions: 1) am I missing something in the usage of sscanf()? and 2) is there a better way to do this in c?

#include <stdio.h>
#include <string.h>

/*
 * using sscanf to parse a line (null terminated string) with fields (words)
 * separated by one or more spaces into an array of words (fields).
 */

void main()
{
        int     i,j;
        int     idx;
        char    string[100] = "word1 word2  word3  word4    word5    word6  word7\0";
        char    fields[20][10];
#if 1
        j=sscanf (&string[0], "%s%s%s%s%s%s", &fields[0][0], &fields[1][0], &fields[2][0], &fields[3][0], &fields[4][0], &fields[5][0]);
        printf("sscanf returned: %d\n",j);
#else
/*
 *  this would be the preferred way to parse a long line of words,
 *  but there is no way to know with certainty how many spaces sscanf
 *  skipped over to obtain the next string (word). A modified version
 *  of sscanf that either modified an integer pointer argument or
 *  updated the pointer to the input string (line) would allow
 *  subsequent calls to pick up where the last sscanf call left off.
 *
 */
        for (i=0,idx=0;i<6;i++){
                j=sscanf (&string[idx], "%s", &fields[i][0]);
                idx += strlen(&fields[i][0]);
                printf("sscanf returned: %d\n",j);
                if (j==0)
                        break;
        }
#endif

        for (i=0;i<6;i++){
                printf("%s",&fields[i][0]);
        }
        printf("\n");
        return;
}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
Jim
  • 23
  • 4
  • Solution works like a charm. Thank you. I've been out of the coding business since 2001. My trusty old c reference book, Harbison & Steels's "c - A Reference Manual (1987)" needs to be retired along with me. – Jim Feb 13 '20 at 01:41
  • 1
    [`void main()` is wrong](https://stackoverflow.com/q/204476/995714) – phuclv Feb 13 '20 at 03:10

1 Answers1

0

In the string literal used as an initializer

char    string[100] = "word1 word2  word3  word4    word5    word6  word7\0";

the explicit terminating zero is redundant. The string literal already contains the terminating zero apart from the explicit terminating zero.

Here you are.

#include <stdio.h>

int main(void) 
{
    char    string[100] = "word1 word2  word3  word4    word5    word6  word7";
    char s[10];

    const char *p = string;

    for ( int n = 0; sscanf( p, "%s%n", s, &n ) == 1; p += n )
    {
        puts( s );
    }

    return 0;
}

The program output is

word1
word2
word3
word4
word5
word6
word7

Another approach is to use either the standard function strtok or pair of functions strcspn and strspn.

For example

#include <stdio.h>
#include <string.h>

int main(void) 
{
    char    string[100] = "word1 word2  word3  word4    word5    word6  word7";

    const char *delim = " \t";

    const char *p = strtok( string, delim );
    while ( p != NULL )
    {
        puts( p );
        p = strtok( NULL, delim );
    }

    return 0;
}

The program output is the same as shown above.

And here is a demonstrative program that uses the standard functions strcspn and strspn.

#include <stdio.h>
#include <string.h>

int main(void) 
{
    char    string[100] = "word1 word2  word3  word4    word5    word6  word7";

    const char *delim = " \t";

    for ( const char *p = string; *p; )
    {
        p += strspn( p, delim );

        const char *q  = p;

        p += strcspn( p, delim );

        int n = p - q;

        if ( n ) printf( "%*.*s\n", n, n, q );
    }

    return 0;
}

Again the output will be the same as shown above.

Pay attention to that in this case extracted words are not zero terminated. So to copy them in a character array as strings you should use memcpy and then append the copied characters with the terminating zero.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • Thanks for the coding examples. The first one is most elegant I was a decent c programmer back in the 1990s. Worked a six month unix support contract for 8 years. I've discovered being inactive in the field for two decades resulted in my forgetting more than I remembered. – Jim Feb 13 '20 at 15:40