1

I am using the following while loop to read through the lines in a file

 while (fscanf(fp, "%m[^\n]s", &line) == 1)
    {
        char c;
        fscanf(fp, "%c", &c);
    }

There are two things I noticed
  1. The while loops operated perfectly unless there is a blank line in the file.
    So if a file contains 5 lines with the second line as blank line then it exits at
    the second line. So it wont read the lines further.

  2. Using the fscanf(fp, "%c", &c); to eat up the trailing newline character
    which doesn't get taken by the earlier fscanf function

Is there any way to resolve 1 and some better alternative to 2 ?

achal
  • 21
  • 5
  • 1
    `getline` or add leading space to the format string `" %m[^\n]s"`. – William Pursell Feb 08 '21 at 13:03
  • Please explain your thinking behind `"%m[^\n]s"` your expectation of what it does. This will reduce necessary guessing and allow to find potential misunderstandings. – Yunnosch Feb 08 '21 at 13:11
  • I didn't know about %m. So this thread made me courios. According to https://stackoverflow.com/questions/38685724/difference-between-ms-and-s-scanf and subordinal citation http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html, the %m is a non-standard extension which automatically allocates memory for the corresponding field. However, a %m forces the user to deallocate memory himself. Therefore, please add deallocation to your code. The scanset takes all chars except \n. You should complete your format string by \n after the scanset to scan a complete line. – Wör Du Schnaffzig Feb 08 '21 at 13:18
  • 1
    Questions involving extensions such as the `m` modifier should describe them or refer to documentation for them. – Eric Postpischil Feb 08 '21 at 13:22

2 Answers2

0

You really should use getline for this. You could use scanf with something like:

#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
        int rv;
        FILE *fp = stdin;
        char *line = NULL;
        while( (rv = fscanf(fp, "%m[^\n]", &line)) != EOF ){
                if( rv == 0 ){
                        line = "";
                }
                puts(line);
                fgetc(fp); /* Consume newline */
                if( rv ){
                        free(line);
                }
        }
        return 0;
}

But, don't. getline is more widely available than %m, and easier to understand. The %m specifier is a non-standard extension that allocates space for the data, and the caller is responsible for freeing that data. getline is a posix extension that does exactly what you're trying to do (allocates space for a buffer to read a full line of text.)

Also, note that the conversion specifier is just [, not s, and the format string should not have the trailing s.

To use getline, you can do:

#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
        FILE *fp = stdin;
        char *line = NULL;
        size_t cap = 0;
        while( getline(&line, &cap, fp) != -1 ){
                fputs(line, stdout);
        }
        free(line);
        return 0;
}

Note that when using getline, the newline stays in the buffer, while the scanf method does not add that character. Another advantage of getline is that you only need to free the buffer once, and it will minimize the allocations.

William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • The question and the answer should explain the `m` modifier, as it is otherwise likely to confuse readers who are unaware of this extension. – Eric Postpischil Feb 08 '21 at 13:21
  • My intention of using `%m[^\n]s` in scanf was to read till a new line is encountered. What I understand is the `m` is used to dynamically allocate strings or any length. Is that correct ? Also is getline a better way to achieve the same ? `scanf` certainly gets cryptic – achal Feb 08 '21 at 13:24
  • @achal Yes, that is what %m does. But `scanf` is pretty free about discarding whitespace, so it's the wrong tool for this. – William Pursell Feb 08 '21 at 13:26
  • How does one discover what's the correct method suitable in a particular situation. Like in this suppose I knew only about `scanf` than I should get some hints to move away from `scanf` to `getline` in man pages. But I don't see any such thing. Any suggestions how to look for appropriate methods for your current problem at hand ? – achal Feb 08 '21 at 13:29
  • could you also give an example of `getline` in this case as I want to read lines of any length. Thanks – achal Feb 08 '21 at 13:35
  • @achal Edited with getline example. I'm not really sure the best way to learn new methods! I suppose mostly by posting on forums and browsing documentation. That's one of life's great questions. – William Pursell Feb 08 '21 at 13:40
  • `getline()` also has the huge advantage that it either reads a line or it doesn't. When `fscanf()` fails to read per the format string, the input stream is in an unknown state and there's no reliable way to recover. – Andrew Henle Feb 08 '21 at 13:41
  • I see your getline one. But what I was trying to do `is - *(lines + lineNumber - 1) = line; lines = realloc(lines, (lineNumber + 1) * sizeof(char **)); lineNumber++;` inside the while loop when I was using `fscanf`. It's like I am trying to create an array of all lines of the file. So storing the whole file in `char**` – achal Feb 08 '21 at 13:46
  • @achal That's easy enough to do. See https://github.com/wrp/examples/blob/main/c/read-txt-into-array.c – William Pursell Feb 08 '21 at 13:50
0

There are multiple issues in your code:

  • the fscanf format syntax for scansets does not have a trailing s, simply write:

    rv = fscanf(fp, "%m[^\n]", &line)
    
  • fscanf() may fail to convert if there is an empty line in the input stream, for which it will return0 without allocating anything not modifying *line.

  • the trailing newline must be read separately, for example with getc(fp).

Here is a modified version to handle these cases:

#include <stdio.h>
#include <stdlib.h>

int parse_file(FILE *fp, void (*handler)(const char *p)) {
    int rv;
    char *line;

    while ((rv = fscanf(fp, "%m[^\n]", &line)) != EOF) {
        if (rv == 0) {
            handler("");
        } else {
            handler(line);
            free(line);
        }
        if (getc(fp) == EOF) /* read the pending newline */
            break;
    }
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189