0

I have a very strange problem, I'm trying to read a .txt file with C, and the data is structured like this: %s %s %d %d Since I have to read the strings all the way to \n I'm reading it like this:

while(!feof(file)){
        fgets(s[i].title,MAX_TITLE,file);
        fgets(s[i].artist,MAX_ARTIST,file);
        char a[10];
        fgets(a,10,file);
        sscanf(a,"%d %d",&s[i].time.min,&s[i++].time.sec);
    }

However, the very first integer I read in s.time.min shows a random big number.

I'm using the sscanf right now since a few people had a similar issue, but it doesn't help.

Thanks!

EDIT: The integers represent time, they will never exceed 5 characters combined, including the white space between.

Stefan
  • 700
  • 10
  • 22
  • 1
    You MUST check the return value of `sscanf()`. – Iharob Al Asimi May 18 '16 at 20:16
  • 2
    `sscanf(a,"%d %d",&s[i].time.min,&s[i++].time.sec);` --> `sscanf(a,"%d %d",&s[i].time.min,&s[i].time.sec);++i;` and [why-is-while-feof-file-always-wrong](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – BLUEPIXY May 18 '16 at 20:17
  • 1
    You have 2 very common mistakes, the first one is not severe while having `i` and `i++` together is very bad. [Read here for `while(!feof())`](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) and [Read this for the other problem](http://stackoverflow.com/questions/949433/why-are-these-constructs-using-undefined-behavior) – Iharob Al Asimi May 18 '16 at 20:18
  • @BLUEPIXY Wow , this worked. Can you put it as an answer, so I can label it as correct? Also, can you try and explain a little bit as to why this happens only on `.min` in the first read? – Stefan May 18 '16 at 20:22
  • @iharob Thanks on pointing that out, if you have any books you would recommend on good/bad practices like this with examples, exercises, please do suggest! – Stefan May 18 '16 at 20:26
  • 2
    Remove the `while (!feof())` too please. – Iharob Al Asimi May 18 '16 at 20:26
  • 1
    @Stefan The order of evaluation of arguments of function must be not to have decided. – BLUEPIXY May 18 '16 at 21:22
  • 1
    Unless the maximum title is very long, you might want to ensure that the `fgets()` calls read a newline before continuing. You should check that each `fgets()` works as well as `sscanf()`. – Jonathan Leffler May 19 '16 at 00:06

3 Answers3

2

Note, I take your post to be reading values from 3 different lines, e.g.:

%s
%s
%d %d

(primarily evidenced by your use of fgets, a line-oriented input function, which reads a line of input (up to and including the '\n') each time it is called.) If that is not the case, then the following does not apply (and can be greatly simplified)

Since you are reading multiple values into a single element in an array of struct, you may find it better (and more robust), to read each value and validate each value using temporary values before you start copying information into your structure members themselves. This allows you to (1) validate the read of all values, and (2) validate the parse, or conversion, of all required values before storing members in your struct and incrementing your array index.

Additionally, you will need to remove the tailing '\n' from both title and artist to prevent having embedded newlines dangling off the end of your strings (which will cause havoc with searching for either a title or artist). For instance, putting it all together, you could do something like:

void rmlf (char *s);
....
char title[MAX_TITLE] = "";
char artist[MAX_ARTIST = "";
char a[10] = "";
int min, sec;
...
while (fgets (title, MAX_TITLE, file) &&     /* validate read of values */
       fgets (artist, MAX_ARTIST, file) &&
       fgets (a, 10, file)) {

    if (sscanf (a, "%d %d", &min, &sec) != 2) {  /* validate conversion */
        fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
        continue;  /* skip line - tailor to your needs */
    }

    rmlf (title);   /* remove trailing newline */
    rmlf (artist);

    s[i].time.min = min;    /* copy to struct members & increment index */
    s[i].time.sec = sec;
    strncpy (s[i].title, title, MAX_TITLE);
    strncpy (s[i++].artist, artist, MAX_ARTIST);
}

/** remove tailing newline from 's'. */
void rmlf (char *s)
{
    if (!s || !*s) return;
    for (; *s && *s != '\n'; s++) {}
    *s = 0;
}

(note: this will also read all values until an EOF is encountered without using feof (see Related link: Why is “while ( !feof (file) )” always wrong?))


Protecting Against a Short-Read with fgets

Following on from Jonathan's comment, when using fgets you should really check to insure you have actually read the entire line, and not experienced a short read where the maximum character value you supply is not sufficient to read the entire line (e.g. a short read because characters in that line remain unread)

If a short read occurs, that will completely destroy your ability to read any further lines from the file, unless you handle the failure correctly. This is because the next attempt to read will NOT start reading on the line you think it is reading and instead attempt to read the remaining characters of the line where the short read occurred.

You can validate a read by fgets by validating the last character read into your buffer is in fact a '\n' character. (if the line is longer than the max you specify, the last character before the nul-terminating character will be an ordinary character instead.) If a short read is encountered, you must then read and discard the remaining characters in the long line before continuing with your next read. (unless you are using a dynamically allocated buffer where you can simply realloc as required to read the remainder of the line, and your data structure)

Your situation complicates the validation by requiring data from 3 lines from the input file for each struct element. You must always maintain your 3-line read in sync reading all 3 lines as a group during each iteration of your read loop (even if a short read occurs). That means you must validate that all 3 lines were read and that no short read occurred in order to handle any one short read without exiting your input loop. (you can validate each individually if you just want to terminate input on any one short read, but that leads to a very inflexible input routine.

You can tweak the rmlf function above to a function that validates each read by fgets in addition to removing the trailing newline from the input. I have done that below in a function called, surprisingly, shortread. The tweaks to the original function and read loop could be coded something like this:

int shortread (char *s, FILE *fp);
...
    for (idx = 0; idx < MAX_SONGS;) {

        int t, a, b;
        t = a = b = 0;

        /* validate fgets read of complete line */
        if (!fgets (title, MAX_TITLE, fp)) break;
        t = shortread (title, fp);

        if (!fgets (artist, MAX_ARTIST, fp)) break;
        a = shortread (artist, fp);

        if (!fgets (buf, MAX_MINSEC, fp)) break;
        b = shortread (buf, fp);

        if (t || a || b) continue;  /* if any shortread, skip */

        if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
            fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
            continue;  /* skip line - tailor to your needs */
        }

        s[idx].time.min = min;   /* copy to struct members & increment index */
        s[idx].time.sec = sec;
        strncpy (s[idx].title, title, MAX_TITLE);
        strncpy (s[idx].artist, artist, MAX_ARTIST);
        idx++;
    }
...
/** validate complete line read, remove tailing newline from 's'.
 *  returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
 *  if shortread, read/discard remainder of long line.
 */
int shortread (char *s, FILE *fp)
{
    if (!s || !*s) return -1;
    for (; *s && *s != '\n'; s++) {}
    if (*s != '\n') {
        int c;
        while ((c = fgetc (fp)) != '\n' && c != EOF) {}
        return 1;
    }
    *s = 0;
    return 0;
}

(note: in the example above the result of the shortread check for each of the lines that make up and title, artist, time group.)

To validate the approach I put together a short example that will help put it all in context. Look over the example and let me know if you have any further questions.

 #include <stdio.h>
#include <string.h>

/* constant definitions */
enum { MAX_MINSEC = 10, MAX_ARTIST = 32, MAX_TITLE = 48, MAX_SONGS = 64 };

typedef struct {
    int min;
    int sec;
} stime;

typedef struct {
    char title[MAX_TITLE];
    char artist[MAX_ARTIST];
    stime time;
} songs;

int shortread (char *s, FILE *fp);

int main (int argc, char **argv) {

    char title[MAX_TITLE] = "";
    char artist[MAX_ARTIST] = "";
    char buf[MAX_MINSEC] = "";
    int  i, idx, min, sec;
    songs s[MAX_SONGS] = {{ .title = "", .artist = "" }};
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    for (idx = 0; idx < MAX_SONGS;) {

        int t, a, b;
        t = a = b = 0;

        /* validate fgets read of complete line */
        if (!fgets (title, MAX_TITLE, fp)) break;
        t = shortread (title, fp);

        if (!fgets (artist, MAX_ARTIST, fp)) break;
        a = shortread (artist, fp);

        if (!fgets (buf, MAX_MINSEC, fp)) break;
        b = shortread (buf, fp);

        if (t || a || b) continue;  /* if any shortread, skip */

        if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
            fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
            continue;  /* skip line - tailor to your needs */
        }

        s[idx].time.min = min;   /* copy to struct members & increment index */
        s[idx].time.sec = sec;
        strncpy (s[idx].title, title, MAX_TITLE);
        strncpy (s[idx].artist, artist, MAX_ARTIST);
        idx++;
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (i = 0; i < idx; i++)
        printf (" %2d:%2d  %-32s  %s\n", s[i].time.min, s[i].time.sec, 
                s[i].artist, s[i].title);

    return 0;
}

/** validate complete line read, remove tailing newline from 's'.
 *  returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
 *  if shortread, read/discard remainder of long line.
 */
int shortread (char *s, FILE *fp)
{
    if (!s || !*s) return -1;
    for (; *s && *s != '\n'; s++) {}
    if (*s != '\n') {
        int c;
        while ((c = fgetc (fp)) != '\n' && c != EOF) {}
        return 1;
    }
    *s = 0;
    return 0;
}

Example Input

$ cat ../dat/titleartist.txt
First Title I Like
First Artist I Like
3 40
Second Title That Is Way Way Too Long To Fit In MAX_TITLE Characters
Second Artist is Fine
12 43
Third Title is Fine
Third Artist is Way Way Too Long To Fit in MAX_ARTIST
3 23
Fourth Title is Good
Fourth Artist is Good
32274 558212 (too long for MAX_MINSEC)
Fifth Title is Good
Fifth Artist is Good
4 27

Example Use/Output

$ ./bin/titleartist <../dat/titleartist.txt
  3:40  First Artist I Like               First Title I Like
  4:27  Fifth Artist is Good              Fifth Title is Good
Community
  • 1
  • 1
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Should the code check that newlines were read by the first two (all three?) `fgets()` calls? You remove the trailing characters if present, but you don't seem to ensure that they are present (and the synchronization will be all wrong if they aren't present). Other than that, your code is pretty similar to what I'd've produced. – Jonathan Leffler May 19 '16 at 06:42
  • Yes, you are 100% correct on that point. Additional validation can be done to insure that the last character in both `title` and `artist` are both `'\n'` to insure all characters fit within `MAX_TITLE` and `MAX_ARTIST` and if not, handle that error. I've got to run to a site inspection, will update when I return. – David C. Rankin May 19 '16 at 13:29
  • @JonathanLeffler, today took quite a bit longer than expected... Regardless, I have tweaked the input routine to protect against a short-read while maintaining the sync of the 3-line read. Doing so significantly changed what was required as far as the validation checks go and I would welcome your thoughts on the result. – David C. Rankin May 20 '16 at 02:26
  • Looks good to me — sorry, it wasn't meant to be quite so demanding. And I've already given the only up-vote I'm allowed to give. There are a few other options that could be considered/mentioned. One would be to use POSIX `getline()`, which always reads a full line unless it runs out of memory. Another would be a specialized wrapper around `fgets()` that uses a local long input buffer (`char buffer[4096];` for example) for the call to `fgets()`, finds the newline (uses `getc()` to read to the newline if there wasn't one), then copies and truncates the data to fit the passed arrays. – Jonathan Leffler May 20 '16 at 03:39
  • One minor benefit of these is that you'd be able to use all the space for the title and artist; as it stands, the newline has to be read into the space for the title/artist and then removed, so you effectively lose (waste?) one character. – Jonathan Leffler May 20 '16 at 03:41
  • Agreed as to all. It is a trade-off, as everything in life or coding is. With a *title, artist, time* file, odds are the maximum line length is known before hand, maybe not, but in the event it is, the buffer can be sized accordingly to avoid the potential for a *short-read*. For any unknown lengths, I've always advocated `getline`, but its strength in dynamic allocation, is ironically also its greatest weakness in not offering a way to limit the read. Either way, the flexibility of C allows you to weight the competing requirements and craft a number of ways to skin the same cat. – David C. Rankin May 20 '16 at 04:02
0

Instead of sscanf(), I would use strtok() and atoi().

Just curious, why only 10 bytes for the two integers? Are you sure they are always that small?

By the way, I apologize for such a short answer. I'm sure there is a way to get sscanf() to work for you, but in my experience sscanf() can be rather finicky so I'm not a big fan. When parsing input with C, I have just found it a lot more efficient (in terms of how long it takes to write and debug the code) to just tokenize the input with strtok() and convert each piece individually with the various ato? functions (atoi, atof, atol, strtod, etc.; see stdlib.h). It keeps things simpler, because each piece of input is handled individually, which makes debugging any problems (should they arise) much easier. In the end I typically spend a lot less time getting such code to work reliably than I did when I used to try to use sscanf().

Daniel Goldfarb
  • 6,937
  • 5
  • 29
  • 61
  • Looks like the problem was (aside from bad practice for while(!feof(file))), that I was parsing with `sscanf` and incrementing i inside the function call. When I put the `i++` in the next line, it worked well. – Stefan May 19 '16 at 14:37
0

Use "%*s %*s %d %d" as your format string, instead...

You seem to be expecting sscanf to automagically skip the two tokens leading up to the decimal digit fields. It doesn't do that unless you explicitly tell it to (hence the pair of %*s).

You can't expect the people who designed C to have designed it the same way as you would. You NEED to check the return value, as iharob said.

That's not all. You NEED to read (and understand reelatively well) the entire scanf manual (the one written by OpenGroup is okay). That way you know how to use the function (including all of the subtle nuances of format strings) and what to do with the return vale.

As a programmer, you need to read. Remember that well.

autistic
  • 1
  • 3
  • 35
  • 80
  • I see what you are saying, but I took the post as meaning he was reading from 3 different lines (that will make a difference for me as well) – David C. Rankin May 19 '16 at 03:10
  • Probably. In that case, he needs to understand what %s really denotes, so he can avoid making that mistake in production code... – autistic May 19 '16 at 03:16
  • I was reading the first 2 lines entirely and as you can see I wasn't expecting `sscanf` to magically skip the lines, since I read the 2 integers in the `char a[]` and use sscanf to parse the chars inside. Thank you for reply! – Stefan May 19 '16 at 14:34