Read from stdin and fill buffer until EOF

Question

I need to read from stdin and fill a buffer of _SC_PAGESIZE (from sysconf()) until stdin is at EOF. This program is supposed to be a wc clone, so I would be expecting something like the contents of a regular file to be passed in. If the buffer isn't big enough for stdin, then I have to keep filling it, process it for information, then clear it and continue to fill the buffer again from the file offset in stdin. I'm just having a problem with tracking the EOF of stdin, and I'm getting an infinite loop. Here's what I have:

int pSize = sysconf(_SC_PAGESIZE);
char *buf = calloc(pSize, sizeof(char));
assert(buf);
if (argc < 2) {
        int fd;
        while (!feof(stdin)) {
                fd = read(0, buf, pSize);
                if (fd == -1)
                        err_sys("Error reading from file\n");
                lseek(0, pSize, SEEK_CUR);
                if (fd == -1)
                        err_sys("Error reading from file\n");
                processBuffer(buf);
                buf = calloc(pSize, sizeof(char));
        }
        close(fd);
}

I'm assuming the problem has to do with the test condition (while (!feof(stdin)), so I guess what I need is a correct test condition to exit the loop.

Possible duplicate of [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) — Yunnosch, Apr 13 '18 at 23:39
You write `stdin(pipe, redirection, etc.)` which seems odd, unless you recognize that `etc` includes `regular file`. There's absolutely nothing magical about stdin. It's just like any other `FILE *`. Some might even say it's standard. But what seems really strange is that although your questions suggests that you expect stdin to be a non-seekable file, you call `lseek` and don't check if it succeeds! — William Pursell, Apr 13 '18 at 23:44
The loop should be `while (1) { size_t bytes = read(0, buf, pSize); if (0 == bytes) break; };` — Lee Daniel Crocker, Apr 13 '18 at 23:47
See also [why `while(feof(file))` is always wrong](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) — Pablo, Apr 14 '18 at 00:23
`lseek(0, pSize, SEEK_CUR);` can fail. Try `lseek(0, fd, SEEK_CUR);` — chux - Reinstate Monica, Apr 14 '18 at 00:39
`processBuffer(buf);` should be `processBuffer(buf, fd);` to cope with partial buffers. — chux - Reinstate Monica, Apr 14 '18 at 00:40
What should happen if the last `read()` before `EOF` occurs and it is partially filled? — chux - Reinstate Monica, Apr 14 '18 at 00:49

David C. Rankin · Answer 1 · 2018-04-14T00:53:57.160

Why are you using a low-level read instead of opening a FILE *stream and using fgets (or POSIX getline)? Further, you leak memory every time you call:

            buf = calloc(pSize, sizeof(char));

in your loop because you overwrite the address contained in buf losing the reference to the previous block of memory making it impossible to free.

Instead, allocate your buffer once, then continually fill the buffer passing the filled buffer to processBuffer. You can even use a ternary operator to determine whether to open a file or just read from stdin, e.g.

int pSize = sysconf(_SC_PAGESIZE);
char *buf = calloc(pSize, sizeof(char));
assert(buf);

FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) {
    perror ("fopen failed");
    return 1;
}

while (fgets (buf, pSize, fp))
    processBuffer(buf);     /* do not call calloc again -- memory leak */

if (fp != stdin) fclose (fp);   /* close file if not stdin */

(note: since fgets will read a line-at-a-time, you can simply count the number of iterations to obtain your line count -- provided your lines are not longer than _SC_PAGESIZE)

If you want to use exact pSize chunks, then you can use fread instead of fgets. The only effect would be to reduce the number of calls to processBuffer marginally, but it is completely up to you. The only thing that you would need to do is change the while (...) line to:

while (fread (buf, (size_t)pSize, 1, fp) == 1)
    processBuffer(buf);     /* do not call calloc again -- memory leak */

if (ferror(fp))     /* you can test ferror to insure loop exited on EOF */
    perror ("fread ended in error");

(note: like read, fread does not insure a nul-terminated string in buf, so insure that processBuffer does not pass buf to a function expecting a string, or iterate over buf expecting to find a nul-terminating character at the end.)

Look things over and let me know if you have further questions.

Another good point -- especially since we are working with a black-box of `processBuffer`. — David C. Rankin, Apr 14 '18 at 00:33
Would it work if I used memset() to put null '/0' in the buffer instead of calloc-ing like you said? — bmw417, Apr 14 '18 at 00:50
*since `fgets` will read a line-at-a-time, you can simply count the number of iterations to obtain your line count* that's not 100% true, if the line is longer as what `pSize`, then `fgets` will not read the whole line. You should count the read input as a line if the newline is present in `buf`. — Pablo, Apr 14 '18 at 00:50
Them some long lines -- but you is correct again -- updated/qualified. — David C. Rankin, Apr 14 '18 at 00:54

Mohammad Azim · Answer 2 · 2018-04-14T00:04:35.977

2

You can write the loop like

int n;
do {
    n = read(0, buf, pSize);
    // process it
} while(n > 0);

Remember EOF is just one exit condition that may not occur before any other error condition occurs. True check for validity to run the loop is a healthy return code from read. Also, note that condition while(n > 0) is enough or not depends on where you are reading from. In case of stdin it may be enough. But for example for sockets the condition can be written like while(n > 0 || errno == EAGAIN)

edited Apr 14 '18 at 00:04

answered Apr 13 '18 at 23:56

Mohammad Azim

2,604
20
21

1

`n = read(0, buf, pSize); // process it` omits the error case of `n == -1`. Might as well have `if (n <= 0) break; // process it } while (1);` – chux - Reinstate Monica Apr 14 '18 at 00:42
isn't that `while(n > 0)` takes care of any -ve and 0 ret code. – Mohammad Azim Apr 15 '18 at 20:11
1

True, the `while(n > 0)` quits the loop due to negative value, yet _after_ the `// process it` code. I would expect a branch of code immediately after the `n = read(....)` when an error occurs. Perhaps `while ((n = read(0, buf, pSize)) > 0) { ; // process it }` – chux - Reinstate Monica Apr 15 '18 at 22:26
@chux that is indeed correct. Thanks for spotting it. – Mohammad Azim Apr 16 '18 at 15:03

Read from stdin and fill buffer until EOF

2 Answers2