2

I'm really stuck on something.

I have a text file, which has 1 word followed by ~100 float numbers. The float numbers are separated by space, tab, or newline. This format repeats several times throughout the text file.

For example, this is what the text file looks like:

one 0.00591 0.07272 -0.78274 ... 
0.0673 ...
0.0897 ...
two 0.0654 ...
0.07843 ...
0.0873 ...
three ...
...
...

This is a snippet of my code:

char word[30];
double a[1000];
double j;

while (!feof(fp))
    {
        fscanf(fp, "%s", word);
        printf("%s\n", word);
        while (!feof(fp) && (fscanf(fp, " %lf", &j)) == 1)
        {
            a[z] = j;
            z++;
            num_of_vectors++;
        }
        z = 0;
    }

The word "nine" in the text file, is printed as "ne". And the word "in" doesn't even print, a floating point number gets printed.

What am I doing wrong?

Any help would be much appreciated.

Thanks.

M. Averbach
  • 121
  • 10
  • 2
    Check the result value of `fscanf(fp, "%s", word) ==1` before using `word` in `printf("%s\n", word);` – chux - Reinstate Monica Jan 13 '16 at 03:25
  • Can we see the type declarations for word, a and j? – David Hoelzer Jan 13 '16 at 03:25
  • 3
    See [`while (!feof(file))` is always wrong](http://stackoverflow.com/questions/5431941/while-feof-file-is-always-wrong) for information on why your outer loop is wrong. You need to test the result of `fscanf()` — either instead of, or as well as, the result of `feof()`, but testing the result of `fscanf()` is usually sufficient. – Jonathan Leffler Jan 13 '16 at 03:29
  • 3
    Note, too, that your inner loop goes infinite when `fscanf()` encounters EOF. Because EOF is not zero, the `while (fscanf(…))` loop will continue indefinitely — until you crash because you aren't checking the array bounds in that loop. The lack of bounds checking could also be why you get the segmentation fault. A better check for the inner loop would be `while (fscanf(…) == 1)`; the will stop both when a word is encountered and on EOF, and will continue while there are numbers to read. – Jonathan Leffler Jan 13 '16 at 03:31
  • I tried adding the fscanf == 1 && !feof checks, and that got rid of the Segmentation Fault, so thank you. But fscanf still returns the incorrect value for 2 words, "nine" gets printed as "ne", and instead of "in" I get a floating point number – M. Averbach Jan 13 '16 at 03:40
  • Checking the result of `fscanf(fp, " %lf", &j) == 1`? Does `j` have the value of infinity? (Note "ni" are 2 letters of "inf") – chux - Reinstate Monica Jan 13 '16 at 03:45
  • Recommend to instead read lines of text and see if first "word/number" is indeed a word or `double` – chux - Reinstate Monica Jan 13 '16 at 03:49

1 Answers1

6

As per the standard:

An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.

The likely reason that nine is giving you ne is because, when reading a double value, nan is one of the acceptable values. Hence, the n and i are read to establish that it's not nan.

Similarly, with the word in, that a valid prefix for inf representing infinity.

The standard also states in a footnote:

fscanf pushes back at most one input character onto the input stream.

so it's quite possible that this is why the i in nine is not being pushed back.

Bottom line is that it's basically unsafe to assume where the file pointer will end up when fscanf operations fail for some reason.


One way to fix this is to use ftell and fseek to save the file pointer for each successfully item, so that you can move back to the correct file position if the thing you're attempting to read is not successful.

Let's say you have the input file:

one 1 2 3 4 5
nine 9 8 7 6 5
in 3.14159 2.71828

The following code will save and restore file positions to make it work as you wish:

#include <stdio.h>

int main(void) {
    char buff[50]; double dbl; size_t pos;
    FILE *fin = fopen("inputFile.txt", "r");
    while (fscanf(fin, "%s", buff) == 1) {
        printf("Got string [%s]\n", buff);
        pos = ftell(fin);
        while (sscanf(buff, "%lf", &dbl) == 1) {
            printf("Got double [%f]\n", dbl);
            pos = ftell(fin);
        }
        fseek(fin, pos, SEEK_SET);
    }
    fclose(fin);
    return 0;
}

By commenting out the fseek, you can see similar behaviour to what you describe:

Got string [one]
Got double [1.000000]
Got double [2.000000]
Got double [3.000000]
Got double [4.000000]
Got double [5.000000]
Got string [ne]
Got double [9.000000]
Got double [8.000000]
Got double [7.000000]
Got double [6.000000]
Got double [5.000000]
Got double [3.141590]
Got double [2.718280]

I consider this solution a little messy in that it's continuously having to call ftell and occasionally fseek to get it to work.


Another way is to just read everything as strings and decide whether it's a numeric or string with a sscanf operation after reading it in, as in the following code (with the afore-mentioned input file):

#include <stdio.h>

int main(void) {
    char buff[50]; double dbl;
    FILE *fin = fopen("inputFile.txt", "r");
    while (fscanf(fin, "%s", buff) == 1) {
        if (sscanf(buff, "%lf", &dbl) == 1) {
            printf("Got double [%f]\n", dbl);
        } else {
            printf("Got string [%s]\n", buff);
        }
    }
    fclose(fin);
    return 0;
}

This works because a floating point value is actually a proper subset of a string (i.e., it has no embedded spaces).


The output of both those programs above is:

Got string [one]
Got double [1.000000]
Got double [2.000000]
Got double [3.000000]
Got double [4.000000]
Got double [5.000000]
Got string [nine]
Got double [9.000000]
Got double [8.000000]
Got double [7.000000]
Got double [6.000000]
Got double [5.000000]
Got string [in]
Got double [3.141590]
Got double [2.718280]

which is basically what was desired.


One thing you need to be aware of is that scanning something like inf or nan as a double will actually work - that is the intended behaviour of the library (and how your original code would have worked had it not had the issues). If that's not acceptable, you can do something like evaluate the string before trying to scan it as a double, to ensure it's not one of those special values.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Thank you very much! I used the sscanf method and it printed all words. Thanks again! – M. Averbach Jan 13 '16 at 05:02
  • This solution is not completely satisfactory: it is questionable whether a line starting with `inf` should be parsed as a number instead of a string. The file format needs to be more precisely specified. – chqrlie Jan 17 '16 at 05:27
  • @chqrlie, I could understand that comment if it *was* specced that `inf` should be processed as a string but, given it will now work as per the original (where `inf` would be scanned as a floating value), and that the OP accepted the answer, I have to disagree. However, I'll make a note of it to ensure no-one gets bitten by the behaviour. – paxdiablo Jan 17 '16 at 11:53
  • Well, `inf` other than the first word of the file, of course. That would still be scanned as a string in the original. – paxdiablo Jan 17 '16 at 11:59