How to properly fread & fwrite from & to a pipe

Question

I have this code which acts as a pipe between two shell invocations.

It reads from a pipe, and writes into a different one.

#include <stdio.h>
#include <stdlib.h>


#define BUFF_SIZE (0xFFF)

/*
 *  $ cat /tmp/redirect.txt |less
 */
int main(void)
{
    FILE    *input;
    FILE    *output;
    int     c;
    char    buff[BUFF_SIZE];
    size_t  nmemb;

    input   = popen("cat /tmp/redirect.txt", "r");
    output  = popen("less", "w");
    if (!input || !output)
        exit(EXIT_FAILURE);

#if 01
    while ((c = fgetc(input))  !=  EOF)
        fputc(c, output);
#elif 01
    do {
        nmemb   = fread(buff, 1, sizeof(buff), input);
        fwrite(buff, 1, nmemb, output);
    } while (nmemb);
#elif 01
    while (feof(input) != EOF) {
        nmemb   = fread(buff, 1, sizeof(buff), input);
        fwrite(buff, 1, nmemb, output);
    }
#endif
/*
 * EDIT: The previous implementation is incorrect:
 * feof() return non-zero if EOF is set
 * EDIT2:  Forgot the !.  This solved the problem.
 */
#elif 01
    while (feof(input)) {
        nmemb   = fread(buff, 1, sizeof(buff), input);
        fwrite(buff, 1, nmemb, output);
    }
#endif

    pclose(input);
    pclose(output);

    return  0;
}

I want it to be efficient, so I want to implement it with fread()&fwrite(). There are the 3 way I tried.

The first one is implemented with fgetc()&fputc() so it will be very slow. However it works fine because it checks for EOF so it will wait until cat (or any shell invocation I use) finishes its job.

The second one is faster, but I'm concerned that I don't check for EOF so if there is any moment when the pipe is empty (but the shell invocation hasn't finished, so may not be empty in the future), it will close the pipe and end.

The third implementation is what I would like to do, and it relatively works (all the text is received by less), but for some reason it gets stuck and doesn't close the pipe (seems like it never gets the EOF).

EDIT: Third implementation is buggy. Fourth tries to solve the bug, but now less doesn't receive anything.

How should this be properly done?

`while (feof)` is always wrong. And the usage of `feof()` function is also wrong, it doesn't return EOF. The second one checks for EOF, if the stream is not O_NONBLOCK, the `read` will return 0 only in case of EOF. The fastest would be to alllocate 4K buffer and write and read using that buffer. Reading and writing with one char is inefficient . If you want do one char at a time, use `fgetc` and `fputc`. — KamilCuk, Apr 13 '19 at 18:49
have you tried `select()` and add check for `stdin`..http://man7.org/linux/man-pages/man2/select.2.html — simptri, Apr 13 '19 at 18:49
`fread` will block until either all the data you requested is available, all the write ends of the pipe are closed (eg, the program writing to the other end of the pipe terminates), or an error occurs. — William Pursell, Apr 13 '19 at 18:51
The other option is `fgets` with `fputs` which will copy a line at a time. — user3386109, Apr 13 '19 at 18:53
Maybe related: [reddit why is GNU `yes` so fast](https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes_so_fast/) — KamilCuk, Apr 13 '19 at 18:57
@KamilCuk @WilliamPursell Forgot the `!` in `while(!feof)`. So pipes are infinite buffers? The program will have already finished by the time I read, and I will have a buffer to read from it as long as the output of the program? Are you implying that implementation 2 is correct? — alx - recommends codidact, Apr 13 '19 at 18:59
The [`while(feof)`](https://stackoverflow.com/questions/5431941/why-is-while-feoffile-always-wrong) is still conceptually always wrong. It just happens to work here, because `fwrite` with zero characters is just a no-op. The proper usage of `feof` would be right after you do a operation on the stream, like `while (1) { count = fread(...); if (feof(...)) break; fwrite(...); }` — KamilCuk, Apr 13 '19 at 19:03
`pipes are infinite buffers? ` Pipes in linux usually have 4K buffer, that's why it's best to disable line buffering on them and use a 4K buffer for I/O operations. | `he program will have already finished` = No. The program runs concurrent to yours. If the pipes buffer is full, and the program calls `write(STDOUT_FILENO` the call will block the program until you read from the buffer — KamilCuk, Apr 13 '19 at 19:06
@KamilCuk Ok, now I get how it works (I think) :) So every `fread()` asks the called program to output another `BUFF_SIZE` characters, and after that, the program will pause until I ask for more? So the result will be in no characters being lost because one program runs faster than the other. — alx - recommends codidact, Apr 13 '19 at 19:08
`FILE*` is a _big_ abstraction, but let's take it. If a `FILE*` is opened on a file descriptor that doesn't has the `O_NONBLOCK` flag set, then if you do `fread(buff, sizeof(buff), 1, in);` then it will return `0` or `sizeof(buff)`. If you call `fread(buff, 1, sizeof(buff), in)` then you can get a partial read - it can return any number between `0` and `sizeof(buff)`. — KamilCuk, Apr 13 '19 at 19:17
@KamilCuk Of course I mean `BUFF_SIZE` as an upper limit, if the called program has fewer bytes to output then I will get less. But will it always block until either I have something to read, or the called program finished? And the called program will always pause when the buffer is full? Assuming `0_NONBLOCK`isn't set. — alx - recommends codidact, Apr 13 '19 at 19:28
Yes. It will block, until it receives it all the required data (in `size` blocks) or EOF. — KamilCuk, Apr 13 '19 at 19:50
@KamilCuk, why do you think 4Kb buffer is the best and not the one used internally by `stdio`? (that is supposed to know the filesystem details better than you, and is portable, as in each system you don't have to say about those 4Kb to become more or less) — Luis Colorado, Apr 15 '19 at 16:06

Luis Colorado · Accepted Answer · 2019-04-15T09:51:09.153

First of all, to say that I think you are having problems more with buffering, than with efficiency. That is a common problem when first dealing with the stdio package.

Second, the best (and simplest) implementation of a simple data copier from input to output is the following snippet (copied from K&R first ed.).

while((c = fgetc(input)) != EOF) 
    fputc(c, output);

(well, not a literal copy, as there, K&R use stdin and stdout as FILE* descriptors, and they use the simpler getchar(); and putchar(c); calls.) When you try to do better than this, normally you incur in some false assumptions, as the fallacy of the lack of buffering or the number of system calls.

stdio does full buffering when the standard output is a pipe (indeed, it does full buffering always except when the file descriptor gives true to the isatty(3) function call), so you should do, in the case you want to see the output as soon as it is available, at least, no output buffering (with something like setbuf(out, NULL);, or fflush()) your output at some point, so it doesn't get buffered in the output while you are waiting in the input for more data.

What it seems to be is that you see that the output for the less(1) program is not visible, because it is being buffered in the internals of your program. And that is exactly what is happening... suppose you feed your program (which, despite of the handling of individual characters, is doing full buffering) doesn't get any input until the full input buffer (BUFSIZ characters) have been feeded to it. Then, a lot of single fgetc() calls are done in a loop, with a lot of fputc() calls are done in a loop (exactly BUFSIZ calls each) and the buffer is filled at the output. But this buffer is not written, because it need one more char to force a flush. So, until you get the first two BUFSIZ chunks of data, you don't get anything written to less(1).

A simple, and efficient way is to check after fputc(c, out); if the char is a \n, and flush output with fflush(out); in that case, and so you'll write a line of output at a time.

fputc(c, out);
if (c == '\n') fflush(out);

If you don't do something, the buffering is made in BUFSIZ chunks, and normally, not before you have such an amount of data in the output side. And remember always to fclose() things (well, this is handled by stdio), or you can lose output in case your process gets interrupted.

IMHO the code you should use is:

while ((c = fgetc(input))  !=  EOF) {
    fputc(c, output);
    if (c == '\n') fflush(output);
}
fclose(input);
fclose(output);

for the best performance, while not blocking unnecessarily the output data in the buffer.

BTW, doing fread() and fwrite() of one char, is a waste of time and a way to complicate things a lot (and error prone). fwrite() of one char will not avoid the use of buffers, so you won't get more performance than using fputc(c, output);.

BTW(bis) if you want to do your own buffering, don't call stdio functions, just use read(2) and write(2) calls on normal system file descriptors. A good approach is:

int input_fd = fileno(input); /* input is your old FILE * given by popen() */
int output_fd = fileno(output);

while ((n = read(input_fd, your_buffer, sizeof your_buffer)) > 0) {
    write(output_fd, your_buffer, n);
}
switch (n) {
case 0: /* we got EOF */
    ...
    break;
default: /* we got an error */
    fprintf(stderr, "error: read(): %s\n", strerror(errno));
    ...
    break;
} /* switch */

but this will awaken your program only when the buffer is fully filled with data, or there's no more data.

If you want to feed your data to less(1) as soon as you have one line for less, then you can disable completely the input buffer with:

setbuf(input, NULL);
int c; /* int, never char, see manual page */
while((c == fgetc(input)) != EOF) {
    putc(c, output);
    if (c == '\n') fflush(output);
}

And you'll get less(1) working as soon as you have produced a single line of output text.

What are you exactly trying to do? (This would be nice to know, as you seem to be reinventing the cat(1) program, but with reduced functionality)

For sending efficiently each line, could I use `fgets()` and `fputs()` instead, and `fflush()` after each iteration (similarly to your last snippet)? If so, could you add it to your answer? — alx - recommends codidact, Apr 15 '19 at 10:28
that's not a good idea, because it ties your program to the maximum line size you have defined.... I don't recommend doing that. You need to know what will happen if you receive lines longer than the buffer size you have selected. Believe me, the first sample I wrote has survived about 50 years right now. — Luis Colorado, Apr 15 '19 at 15:38
`fputs()` uses buffering also, so it will hold the output because you are dealing with a fifo, and not with a tty. If you check the number of times each character is copied, you'll see that both solutions are similar. — Luis Colorado, Apr 15 '19 at 15:45
I beg to differ:it is (performancewise) sub-optimal. more than half of the CPU cycles will be spent on the function call overhead. Just benchmark the stuff and compare system time <--> usertime. Just get rid of all these function calls. — wildplasser, Apr 15 '19 at 23:48
@wildplasser, sometimes you have to sacrifice performance with a clear notation. Normally, where you spend all the time is in the readin/writting of the fifos. In this case we can go from (O(n) --> O(2n)) but you have not considered that copying to a local buffer also implies a loop, and then copy back to the output buffer does the same (so we are also in O(2n) already). Obviously, if you do nothing than copy, all the time will be spent there. What's the difference between copying a bunch of chars one by one, or doing them calling a routine (that finally does this copying them one by one) ? — Luis Colorado, Apr 16 '19 at 06:02
@wildplasser, and the difference between calling the function or expand the macro belongs to the case you include or not the `` header file, which is something buried depth in the legacy. Of course that using a function to copy a char is more expensive than doing with only one instruction. But that happens automatically when you `#include `, so it's not an issue here. I think today nobody even thinks of using stdio package without including its header file, so your point in the discussion is clearly out of scope. — Luis Colorado, Apr 16 '19 at 06:09

wildplasser · Answer 2 · 2019-04-14T15:23:07.447

0

Simplest solution:

while (1) {
    nmemb = fread(buff, 1, sizeof buff, input);
    if (nmemb < 1) break; 
    fwrite(buff, 1, nmemb, output);
}

Similarly, for the getc() case:

while (1) {
    c = getc(input);
    if (c == EOF) break;
    putc(c, output);
}

Replacing fgetc() by getc() will give performance equivalent to the fread()case. (getc() will (often) be a macro, avoiding function-call overhead). [just take a look at the generated assembly.

edited Apr 14 '19 at 15:23

answered Apr 13 '19 at 18:50

wildplasser

43,142
8
66
109

`nmemb` is unsigned (`size_t`). Why check `<1`? Just `!` will do. – alx - recommends codidact Apr 13 '19 at 19:36
As the link @KamilCuk posted in a comment to the question proves, buffering can improve speed a lot. – alx - recommends codidact Apr 13 '19 at 19:38
1

WRT `(nmemb<1)`: you are right. (But it won't harm) WRT buffering: In either case, the FILE is buffered. (check the assembly!) – wildplasser Apr 13 '19 at 19:51
Without `-O3` I have very different assembly code. With `-O3`, `getc`&`putc` are optimized to `fread`&`fwrite`. GCC 8 @ Debian – alx - recommends codidact Apr 13 '19 at 20:04
`getc()` is normally the same as `fgetc(stdin)`, so you don't always get that performance. The macro issue affects calling a `getc()` version of the function just to make a second call to `fgetc(stdin)`. – Luis Colorado Apr 15 '19 at 09:08
One advice: do the math. (or: measure) I rest mycase. – wildplasser Apr 16 '19 at 22:45

How to properly fread & fwrite from & to a pipe

2 Answers2