0

I'm working on a program that reads text from a file and parses the text to words and manipulates them. I'm parsing with fscanf like that

while (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1)    
{
    /*manipulate the text word by word */
    …
}

I wanna write next to each word that I find in which line I found it.

Is there a way that I can check when I moved down a line
when using the function fscanf?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 2
    `fscanf` does not distinguish lines, in fact most of its format specifiers ignore all whitespace. Try using `fgets` to read each line and then using a string splitting function. – Weather Vane Jun 13 '17 at 20:20
  • ...such as `strsep` or `strtok` and its derivatives, if you want to parse words. – Weather Vane Jun 13 '17 at 20:27
  • `fscanf (fp, " %32[^ ,.\t\n]%*c", word)` skips `'\n'` in various places: Leading `" "` and maybe `"*c"`. Use `fgets()` to read a _line_. – chux - Reinstate Monica Jun 13 '17 at 20:30

2 Answers2

4

The soundest advice is to use fgets() or perhaps POSIX getline() to read lines and then consider using sscanf() to parse each line. You will probably need to consider how to use sscanf() in a loop. There are also numerous other options for parsing the line instead of sscanf(), such as strtok_r() or the less desirable strtok() — or, on Windows, strtok_s(); strspn(), strcspn(), strpbrk(); and other functions that are not as standardized.

If you feel you must use fscanf(), then you probably need to capture the trailing context. A simple version of that would be:

char c;
while (fscanf(fp, " %32[^ ,.\t\n]%c", word, &c) == 2)
    …

This captures the character after the word, assuming there is one. If your file doesn't end with a newline, it is possible a word will be lost. It's also rather too easy to miss a newline. For example, if the line ends with a full stop (period) before the newline, then c will hold the . and the newline will be skipped by the next iteration of the loop. You could overcome that with:

char s[33];
while (fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s) == 2)
    …

Note that the length in the format string must be one less than the length in the variable declaration!

After a successful call to fscanf(), the string s could contain multiple newlines and blanks and so on. The fscanf() functions mostly don't care about newlines, and the scan set for s would read multiple newlines in a row if that's what's in the data file.

If you explicitly capture the status from fscanf(), you can be more sensitive to files that end without a newline (or a punctuation character), or that cause other problems:

char s[33];
int rc;
while ((rc = fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s)) != EOF)
{
    switch (rc)
    {
    case 2:
        …proceed as normal, checking s for newlines.
        break;
    case 1:
        …probably an overlong word or EOF without a newline.
        break;
    case 0:
        …probably means the next character is one of comma or dot.
        …spaces, tabs, newlines will be skipped without detection
        …by the leading space in the format string.
        break;
    default:
        assert(0);
        break;
    }
}

If you start to care about !, ?, ;, :, ' or " characters — not to mention ( and ) — life gets more complex still. In fact, at that point, the alternatives to sscanf() start looking much better.

It is very hard to use the scanf() family of functions correctly. They're anything but tools for the novice, at least once you start needing to do anything complex. You could look at A beginner's guide to not using scanf(), which contains much valuable information. I'm not wholly convinced by the last couple of examples which are supposed to be bomb-proof uses of scanf(). (It is a little easier to use sscanf() correctly, but you still need to understand what you're up to in detail.)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
1

Read lines with fgets() and then parse them using sscanf:

char buff[1024];
int lineno = 0;
int offset = 0;
while (fgets(buff, 1024, fp)) {
    lineno++;
    offset = 0;
    while (sscanf(buff + offset, " %32[^ ,.\t\n]%*c", word) == 1)
    {
    /* manipulate the text word by word */

    }
}

In second loop you must increase buffer offset appropriately in order to parse line correctly. for this you can use %n for example in order to get read bytes.

Parham Alvani
  • 2,305
  • 2
  • 14
  • 25