process hangs when writing large data to pipe

Question

I have a problem with hung processes with my Perl program, and I think I have isolated it to whenever I write significant amounts of data to a pipe.

Below is all of the code that I think is relevant from my program. When the program hangs, it hangs on the line in ResponseConstructor.pm: print { $self->{writer} } $data;.

I've tested with different data sizes, and it doesn't appear to hang at an exact size. It may become more likely with size, though: sizes around 32KB sometimes work, sometimes don't. Every time I've tried a 110KB string it has failed.

I believe I've also ruled out the contents of the data as a cause, because the same data sometimes causes a hang, and othertimes doesn't.

This is probably the first time I have used pipes in a program before, so I'm not sure what to try next. Any ideas?

use POSIX ":sys_wait_h";
STDOUT->autoflush(1);

pipe(my $pipe_reader, my $pipe_writer);
$pipe_writer->autoflush(1);
my $pid = fork;
if ($pid) {
    #I am the parent
    close $pipe_writer;
    while (waitpid(-1, WNOHANG) <= 0){
        #do some stuff while waiting for child to send data on pipe
    }
    #process the data it got
    open(my $fh, '>', "myoutfile.txt");
    while ( my $line = <$pipe_reader>){
        print $fh $line;
    }
    close $pipe_reader;
    close $fh;
else {
    #I am the child
    die "cannot fork: $!" unless defined $pid;
    close $pipe_reader;
    my $response = ResponseConstructor->new($pipe_writer);

    if ([a condition where we want to return small data]){
        $response->respond('small data');
        exit;
    }
    elsif ([a condition where we want to return big data]){
        $response->respond('imagine this is a really big string');
    }
}

ResponseConstructor.pm:

package ResponseConstructor;

use strict;
use warnings;

sub new {
    my $class = shift;
    my $writer = shift;

    my $self = {
        writer => $writer
    };
    bless($self, $class);
    return($self);
}

#Writes the response then closes the writer (pipe)
sub respond {
    my $self = shift;
    my $data = shift;
    print { $self->{writer} } $data;
    close $self->{writer};
}


1;

Well, yes. Pipes only have a limited size. You need to read some of the data before more can be written. — melpomene, Jan 30 '18 at 22:05
Is there a way to configure the size? I don't see anything on the documentation for pipe: http://perldoc.perl.org/functions/pipe.html. Also, even if the pipe fills up, why can't the writing program just hang until it's emptied and then resume writing? What I've seen is that it hangs forever, longer than it would take the reading process to process the data. — Stephen, Jan 30 '18 at 22:09
No, you can't configure the size. Where is the code that reads from the pipe? As far as I can see from your pseudocode, the parent process doesn't start reading until the child has exited. — melpomene, Jan 30 '18 at 22:11
The code that reads from the pipe is in the lines beneath "#process the data it got". I added some code explicitly to make it clearer. — Stephen, Jan 30 '18 at 22:23
That code is never executed because the preceding loop doesn't stop. — melpomene, Jan 30 '18 at 22:24
Your child process can't exit because it's not finished writing data to the pipe. — melpomene, Jan 30 '18 at 22:28
Oh, if you're talking about the condition in which it crashes, then yes. That's exactly what's happening. Sometimes the child process finishes writing and it all works well, othertimes the parent process hangs in that loop, and the child process hangs writing. So I guess I can modify the process to read immediately. I didn't do it before because until you mentioned it, I had no idea pipes had a max size. — Stephen, Jan 30 '18 at 22:33
I'm still a bit nervous though because it seems like I could still hit some max pipe size limit if the parent runs "too slowly." — Stephen, Jan 30 '18 at 22:35
There is no crash. If the parent runs too slowly, the child will block writing to the pipe until some data has been consumed by the parent and continue. If the child runs too slowly, the parent will block reading from the pipe until some data has been written to it by the child. — melpomene, Jan 30 '18 at 22:37
Oooh, I think I get it now. Sorry, I am new at this. So what was happening before is that the child blocked writing to the pipe, but the parent would never read until the child process exited. It basically deadlocked. As long as the parent is reading a little bit, it should be fine regardless of varying rates. That makes me feel better. Thanks for your help, I really appreciate it. :) — Stephen, Jan 30 '18 at 22:41
@zdim it's a good metaphor, but I didn't know newlines had a special status. I guess that's in the case where you have `$pipe->autoflush(1)` as in my example? What does that do exactly, and what happens if I have a single line that is larger than the max pipe size? // A separate question is, how do I ensure that my parent process doesn't block reading indefinitely once the child is done writing to the pipe? Will it simply notice that my child ran `close $pipe_writer` and avoid blocking? — Stephen, Jan 30 '18 at 22:59
@Stephen A pipe is "block buffered" so stuff is there to read only once a block (4kB?) has been written or pipe got full (64kB?). In my tests I can only _count on_ flushing at over 32kB and I'd think that the block flushing isn't set in stone (may be about atomicity of writes or such). Regardless -- `autoflush` sets [`$|`](https://perldoc.perl.org/perlvar.html#$|) which "_forces a flush right away and after every write or print on the currently selected output channel_". So after _every_ print from writer the other side can read. — zdim, Jan 31 '18 at 06:21
@Stephen If nobody is reading then the writer is going to get blocked once the pipe gets full. So if you print a 1Mb string those 64kB get "in the pipe" and it all stops; so such a print will hang -- until something starts reading on the other side. If there is a reader it will be getting 64kB chunks of that string in repeated reads. Now, your parent can "block reading" only if there is nothing being written (or nothing flushed) while it's in `while (<$rd>)`. But while the child writes (and flushes) the parent reads; then the child closes and exits and all is well. — zdim, Jan 31 '18 at 06:31
@Stephen Finally, while your `waitpid` is non-blocking with `WNOHANG`, it is in a `while` loop which terminates only once (some) child exits. So the parent's spinning until a child exits and then it will start reading -- while the child is blocked after it filled the pipe, sitting at its `print { $self->{writer} } $data`. Precisely deadlocked as you said. So move any `wait` you choose to use (like `waitpid`) after the pipe's been read. — zdim, Jan 31 '18 at 06:43
@Stephen As for wanting to do something while waiting, you can use [select](https://perldoc.perl.org/functions/select.html) or [IO::Select](http://perldoc.perl.org/IO/Select.html). There are other, more low-level ways, to make pipe reads non-blocking. — zdim, Jan 31 '18 at 07:12
@zdim a thousand thanks for your generous help here. I have still more questions, but I have created new questions based on them: https://stackoverflow.com/questions/48553585/non-blocking-read-of-a-pipe-while-doing-other-things and https://stackoverflow.com/questions/48553644/several-questions-about-pipes — Stephen, Feb 01 '18 at 00:33
@Stephen I saw your questions :) A much better way to deal with it than comments. I may write something later, unless someone else covers it. Will delete these extensive and excessive (and imprecise) comments in a few minutes. — zdim, Feb 01 '18 at 00:37
Maybe you're referring to this: "If there is a reader it will be getting 64kB chunks of that string in repeated reads." I imagine this is imprecise since it may be getting less than that depending on flushing. — Stephen, Feb 01 '18 at 00:41
@Stephen One thing that may not fit anywhere but that you may find useful, see [this post](https://stackoverflow.com/a/45387316/4653379) — zdim, Feb 01 '18 at 00:44
@Stephen No, I didn't mean that that there are errors (I don't know of any I mean:), just that it doesn't belong ... but I can surely leave it for as long as you wish. At some point it should go away since now you'll have more carefully and precisely written answers. — zdim, Feb 01 '18 at 00:46
Btw based on what I read here (https://stackoverflow.com/questions/48553644/several-questions-about-pipes/48554104#48554104) think this is misleading: "A pipe is "block buffered" so stuff is there to read only once a block (4kB?) has been written or pipe got full (64kB?). In my tests I can only count on flushing at over 32kB and I'd think that the block flushing isn't set in stone" The pipe itself isn't block buffered, but rather writes *to* the pipe. Also it seems that whether the pipe is full or semi-full, if it's in the pipe it seems that it is available for reading. — Stephen, Feb 02 '18 at 20:41

score 2 · Answer 1 · answered Feb 02 '18 at 05:38

You probably shouldn't be ignoring your pipe while it is returning data: you can use a select on the pipe (instead of waitpid) to see if there's any data to read during your waiting loop, but if you really want a larger pipe buffer so you can read it all at once, you can use a socketpair instead of a pipe and then you can use setsockopt to make the buffer as large as you need it.

process hangs when writing large data to pipe

1 Answers1

Linked