3

I have a file with lines of the format:

<string><spaces><string><spaces><string>

I don't know the number of spaces between each string. I would like to parse the line and insert each string into a variable (the first string represents a name, the second last name, and the third id). I saw that I can use strtok but I prefer not to use it, rather a loop which iterates over the line.

I also found out that I can use:

if(fscanf(party_data,"%s %s %s",name,last,id) != 3){
   break;
}

but I think a while loop is better. The problem with the while loop is the fact I don't know the amount of spaces between each string. My goal is to create a function parseLine which gets three pointers (name, last and id) and parses the line. How should the function look like?

vesii
  • 2,760
  • 4
  • 25
  • 71
  • 1
    You can read some ideas in [these class notes](https://www.eskimo.com/~scs/cclass/notes/sx10h.html). – Steve Summit Aug 27 '19 at 20:52
  • 2
    How likely is bad input? You may find it better to grab a known line of input with something like `getline()` and then parse each line separately with `sscanf()`. That way one bad line won't ruin the parsing of the rest of the input. Recovering from a botched `fscanf()` is hard - you don't really know where you are in your input stream, and if the stream isn't seekable it's almost impossible to recover. – Andrew Henle Aug 27 '19 at 21:03
  • 1
    Why not `while (fscanf(party_data, "%s %s %s", name, last, id) == 3) { /* do your stuff */ }` (or better read with `fgets()` then use `sscanf()`) – David C. Rankin Aug 27 '19 at 21:19

2 Answers2

2

A single space character in a format string for scanf (or its cousins such as fscanf, sscanf, vfscanf, and so on) can match an arbitrary amount of white space in the input (including not just spaces, but also tabs, vertical tabs, and new-lines), so your fscanf call is probably fine as it stands now. Oh, except for one detail: you generally want to avoid a bare %s conversion, and use something like:

char dest[16];
scanf("%15s", dest);

That is, you always want to specify the maximum size, which should be one smaller than the size of buffer you're supplying.

If you don't want to use scanf and company, you have a couple of choices. You could start with strspn and strcspn, or you could just use while loops with isspace:

char *line = /* whatever*/;

while (!isspace(*line))
   *first++ = *line++;
*first = '\0';

while (isspace(*line))
    ++line;

while (*isspace(*line))
    *second++ = *line++;
*second = '\0';

while (isspace(*line))
    ++line;

while (*isspace(*line))
    *third++ = *line++;
*third = '\0';

In real use, you'd also want to keep track of the length of the destination buffer, and only copy as much into it as it can actually hold (or else figure up the size each needs, and allocate accordingly).

Oh, and one other minor detail: when you call isspace, you should really cast its operand to unsigned char. Without casting, using it for some non-English characters (e.g., with accents, unlauts, etc.) can give undefined behavior.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
1

This is one of the most fundamental things you must do in any language:

  1. read data from a file, and
  2. parse that data into needed information.

In your case you have whitespace separate names and an ID in your input file. While you can use fscanf directly, it is horribly fragile. If a single line does not match your format string, your read will fail with a matching failure, character extraction from the stream ceases, and you are then left with the remainder of the line in your input buffer to deal with before you can move forward.

For that reason, a better approach is the read each line into a buffer with fgets and a sufficiently sized buffer (or using POSIX getline) to consume an entire line of input with every read. The you can parse the needed information from the line stored in the buffer without affecting your read operation. This also provides the benefit of being able to independently validate your (1) read, and (2) the parse of information.

There are many ways to parse the needed information from the buffer. You can use sscanf reading from the buffer (much like you would have used fscanf on the input itself), you can walk-a-pair-of-pointers down the buffer, bracketing each word and then memcpy and nul-terminate, you can use strtok (but it modifies the original buffer), or you can use a combination of strspn and strcspn to bracket each word similar to walking the pointers.

In your case let's just use sscanf since for a fixed format, it is just as easy. To store your 3-strings worth of name, last, id, create a struct with those members, then you can create an array of struct (we will leave the dynamic array, or linked-list for later), and you can store all names and IDs you read, for example:

#include <stdio.h>

#define MAXID  16   /* if you need a constant, #define one (or more) */
#define MAXNM  32
#define MAXPN 128
#define MAXC 1024

typedef struct {
    char name[MAXNM],
         last[MAXNM],
         id[MAXID];
} typeperson;

You now have a struct (with a convenient typedef to typeperson you can use to create an array of struct (with each array initialized all zero), e.g.

int main (int argc, char **argv) {

    char buf[MAXC];
    size_t n = 0;
    typeperson person[MAXPN] = {{"", "", ""}};

You now have an array of MAXPN (128) person to fill. Now simply open your file using the name provided as the first argument to your program (or read from stdin by default if no argument is given) and validate the file is open for reading:

    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

With your file open and validated, you can now read each line into buf and then parse name, last, id from buf using sscanf (all conversion specifiers except for "%c" and "%[..]" (and technically "%n", but that doesn't extract from the buffer) skip all leading whitespace allowing you to separate your name, last, id regardless of the amount of whitespace between them:

    /* protect array bounds and read each line into struct */
    while (n < MAXPN && fgets (buf, MAXC, fp)) {
        if (sscanf (buf, "%s %s %s", 
                    person[n].name, person[n].last, person[n].id) == 3)
        n++;
    }

(note: the test of n < MAXPN that protects your array bounds and prevents you from writing more elements than you have storage for)

What happens if the line has the wrong format? How do you recover? Simple. By consuming a line with each read, any line that doesn't match your sscanf format string is quietly ignore and does not cause you any problem.

All that remains is closing the file and using your data in any way you need. Putting it together in a short example, you could do:

#include <stdio.h>

#define MAXID  16   /* if you need a constant, #define one (or more) */
#define MAXNM  32
#define MAXPN 128
#define MAXC 1024

typedef struct {
    char name[MAXNM],
         last[MAXNM],
         id[MAXID];
} typeperson;

int main (int argc, char **argv) {

    char buf[MAXC];
    size_t n = 0;
    typeperson person[MAXPN] = {{"", "", ""}};
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }
    /* protect array bounds and read each line into struct */
    while (n < MAXPN && fgets (buf, MAXC, fp)) {
        if (sscanf (buf, "%s %s %s", 
                    person[n].name, person[n].last, person[n].id) == 3)
        n++;
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)  /* output the resutls */
        printf ("person[%3zu] : %-20s %-20s %s\n",
                i, person[i].name, person[i].last, person[i].id);
}

Example Input File

With an intentional line that does not match the format (e.g. "..."):

$ cat dat/peopleid.txt
George      Washington          1
John        Adams               2
Thomas      Jefferson           3
James       Madison             4
...
Royal       Embarrasment        45

Example Use/Output

$ ./bin/struct_person < dat/peopleid.txt
person[  0] : George               Washington           1
person[  1] : John                 Adams                2
person[  2] : Thomas               Jefferson            3
person[  3] : James                Madison              4
person[  4] : Royal                Embarrasment         45

Look things over and let me know if you have any further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85