0

I'm debugging some legacy code. It is using fscanf and the [] format specifiers to read some tilde (~) separated values. The file format is simple - value~value. value can be all spaces. If the first value is all spaces, it fails. Below is a stripped down sample...

#include <stdio.h>
#include <string.h>
#include <stdlib.h>


int main(void)
{
        char b1[100], b2[100];


        FILE *fp = fopen( "temp_out", "r" );

        for ( int i = 0; i < 6; i++ )
        {
                int n = fscanf( fp, "%[^~]~%[^\n]\n", b1, b2 );
                printf("fscanf converted %d '%s' '%s'\n", n, n>0?b1:"",n>1?b2:"");
        }
        fclose( fp );
}

And my input file is...

line1~blah
     ~blah2
blah~    

This is compiled with gcc 8.4.1 on RedHat.

If I run it, I get...

fscanf converted 2 'line1' 'blah'
fscanf converted 0 '' ''
fscanf converted 0 '' ''
fscanf converted 0 '' ''
fscanf converted 0 '' ''
fscanf converted 0 '' ''

Only the first line is converted.

However, if I swap the first 2 input lines...

     ~blah2
line1~blah
blah~    

It works...

fscanf converted 2 '     ' 'blah2'
fscanf converted 2 'line1' 'blah'
fscanf converted 2 'blah' '    '
fscanf converted 0 '' ''
fscanf converted 0 '' ''
fscanf converted 0 '' ''

I can put that line in a string and sscanf works. I can use fgets+sscanf, and that works, so that is a work around, but this pattern is used everywhere in the legacy code and I'd like to know what gives.

Any ideas?

Frank
  • 81
  • 5
  • 2
    See [What is the effect of trailing white space in a `scanf()` format string?](https://stackoverflow.com/questions/19499060/what-is-the-effect-of-trailing-white-space-in-a-scanf-format-string) and then decide that you won't use trailing white space in the format string. You will be best served by reading lines (`fgets()` or POSIX `getline()`) and then parsing those (`sscanf()`) which you mention as a workaround — it is almost certainly the best (simplest) way to fix the problems. – Jonathan Leffler Aug 30 '21 at 05:32
  • You must use the *field-width* modifier or `fscanf()` is no safer than `gets()` when filling arrays. You would need `"%99[^~]~%99[^\n]"` to prevent writing beyond the end of your array if a long line is encountered. See: [Why gets() is so dangerous it should never be used!](https://stackoverflow.com/q/1694036/3422102) for details on the buffer-overrun issue. – David C. Rankin Aug 30 '21 at 06:39

1 Answers1

3

The problem is with the last '\n' character in the format. It matches any sequence of whitespace characters in the input stream.

If you want to read and ignore exactly one \n character, use the %*1[\n] conversion. For any sequence of newlines, use %*[\n] (this allows you to skip empty lines, but not lines that consist entirely of whitespaces).

Better yet, get rid of fscanf entirely and parse the input with proper tools.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • Thank you. I keep forgetting fscanf reads a stream, not line by line. Yes, I 100% agree with you about using a proper parsing library, but that is pretty minor compared to the other issues we are facing. – Frank Aug 30 '21 at 05:24
  • 1
    The use of `%*1[\n]` is dubious as best. Any variation of additional whitespace will cause a *matching-failure*. If you are worried about extracting only a single line, then using `fgets()` with `sscanf()` would be a better choice. – David C. Rankin Aug 30 '21 at 06:43