The soundest advice is to use fgets()
or perhaps POSIX
getline()
to read lines and then consider using
sscanf()
to parse each line. You will probably need to consider how to use sscanf()
in a loop. There are also numerous other options for parsing the line instead of sscanf()
, such as strtok_r()
or the less desirable strtok()
— or, on Windows, strtok_s()
;
strspn()
,
strcspn()
,
strpbrk()
; and other functions that are not as standardized.
If you feel you must use fscanf()
, then you probably need to capture the trailing context. A simple version of that would be:
char c;
while (fscanf(fp, " %32[^ ,.\t\n]%c", word, &c) == 2)
…
This captures the character after the word, assuming there is one. If your file doesn't end with a newline, it is possible a word will be lost. It's also rather too easy to miss a newline. For example, if the line ends with a full stop (period) before the newline, then c
will hold the .
and the newline will be skipped by the next iteration of the loop. You could overcome that with:
char s[33];
while (fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s) == 2)
…
Note that the length in the format string must be one less than the length in the variable declaration!
After a successful call to fscanf()
, the string s
could contain multiple newlines and blanks and so on. The fscanf()
functions mostly don't care about newlines, and the scan set for s
would read multiple newlines in a row if that's what's in the data file.
If you explicitly capture the status from fscanf()
, you can be more sensitive to files that end without a newline (or a punctuation character), or that cause other problems:
char s[33];
int rc;
while ((rc = fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s)) != EOF)
{
switch (rc)
{
case 2:
…proceed as normal, checking s for newlines.
break;
case 1:
…probably an overlong word or EOF without a newline.
break;
case 0:
…probably means the next character is one of comma or dot.
…spaces, tabs, newlines will be skipped without detection
…by the leading space in the format string.
break;
default:
assert(0);
break;
}
}
If you start to care about !
, ?
, ;
, :
, '
or "
characters — not to mention (
and )
— life gets more complex still. In fact, at that point, the alternatives to sscanf()
start looking much better.
It is very hard to use the scanf()
family of functions correctly. They're anything but tools for the novice, at least once you start needing to do anything complex. You could look at A beginner's guide to not using scanf()
, which contains much valuable information. I'm not wholly convinced by the last couple of examples which are supposed to be bomb-proof uses of scanf()
. (It is a little easier to use sscanf()
correctly, but you still need to understand what you're up to in detail.)