6

Is it possible to split STDIN between multiple readers, effectively becoming a job queue? I would like pass each line to a single reader. Named pipes almost work, but simultaneous reads interfere:

reader.sh

#!/usr/bin/env bash
while read line
do
  echo $line
done <  fifo

writer.sh

#!/usr/bin/env bash
while true
do
  echo "This is a test sentance"
  sleep 1
done

execution:

mkfifo fifo
./reader.sh &
./reader.sh &
./writer.sh > fifo

Occasional output (particularly if the readers and writers are in separate windows)

This is atetsnac
Ti sats etnesats etne etsnac
isats etnes etsnac
Tisi etsnac
hi etsnac
Ti sats etn
hsi etsnac

Notes:

  • I know there are better approaches, just curious if this could be made to work
  • I assume this isn't a bug as I've tested both Linux and OSX boxes
  • I'd like one consumer per line, which rules out tee
  • I'd like to consume STDIN, which rules out xargs
  • GNU coreutils split can allocate round robin, but not first available
  • GNU parallel --pipe waits until STDIN closes; I'd like to allocate ASAP
user3769065
  • 601
  • 6
  • 11
  • I reproduced this here (OS X), but I don't understand it. – Barmar Oct 21 '15 at 22:47
  • I think I understand. While writes to a pipe are atomic (as long as they're smaller than `BUFSIZ`), message boundaries are not saved in the pipe. So concurrent readers can each read different parts of an input. The system call for reading from a stream doesn't provide any way to request a whole line as a unit. – Barmar Oct 21 '15 at 22:51
  • See http://stackoverflow.com/questions/20597149/multiple-read-processes-of-the-same-pipe-can-all-read-the-same-message – Barmar Oct 21 '15 at 22:54
  • I think pipes are just the wrong mechanism for this. Message queues or datagram sockets would be better, although they don't have nice interfaces in `bash`. Yuo may have to write a script in Perl, PHP, or Python. – Barmar Oct 21 '15 at 22:56
  • You could invoke GNU Parallel as `sem --id mymutex` as a prefix to your `read` commands so they execute one at a time. – Mark Setchell Dec 01 '15 at 09:35

2 Answers2

6

No, in general it is not possible to do it robustly. Writes to a named pipe less than PIPE_BUF (>=512 bytes on all POSIX systems) are atomic. The problem is the reads are not atomic, and there is no standard (or non standard AFAIK) way to make them atomic. On a blocking read of the pipe if 1 or more byte is available they will be read immediately with actual number read returned as the return value.

Rochkind, Advance UNIX Programming states:

Because there is no guarantee of atomicity you must never allow multiple readers unless you have another concurrency control mechanism in place .... use something like a message queue instead.

Having said all that, for fun, it is possible to achieve surprisingly robust behavior. The reason the line based cat | while read line do; .. approach seems to work is because cat is immediately snatching lines from the pipe as soon as they arrive, and readers are ready to read as soon as writing begins, as you mentioned. Because it is reading straight away it happens to snatch up lines (plural) at the line boundaries at which they are being written. In general though a line based approach is not going to be very robust because the message boundary is not predictable.

If you wrote and read in constant sized chunks <=PIPE_BUF you'd do better. Your guaranteed never to read more than you ask for, and as long as you are writing constant sized chunks, less than PIPE_BUF in size with each write there is no reason that there should ever be less than a multiple of a chunk of bytes available for reading, however, it is not guaranteed all available bytes will actually be read; It is not an error for the underlying read system call to return less bytes than what you request regardless of how many bytes are actually available to be read:

On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.

And there may be other peculiar reasons - if the standards don't explicitly say it is guaranteed and the conditions under which it is guaranteed, don't assume it is.

--

reader.sh:

#!/bin/bash
while read -N 21 packet
do
  echo [$$] $packet
done<fifo

writer.sh

#!/bin/bash
for((i=0; i<100; i++))
do
  s=`printf "%020d" $i`
  echo $s
  echo "wrote $s" >&2
done

execution:

mkfifo fifo
./reader.sh &
./reader.sh &
./writer.sh > fifo
spinkus
  • 7,694
  • 4
  • 38
  • 62
  • I would suggest rewording the answer. First sentence suggests there is no easy solution and at the end working solution is provided. I skipped the answer and found the solution with fixed size record myself. – brablc Aug 07 '19 at 07:26
  • I reworded, but the answer is still you can't do it completely robustly so ... use a msg queue or similar. The solution seems to work most of the time but there may be edge cases where it doesn't. – spinkus Aug 09 '19 at 08:13
  • It is up to you ;-) But I would write it as "Yes, in general it is possible, but there is no robust solution." – brablc Aug 09 '19 at 09:45
-1

You can change the reader to be

#!/usr/bin/env bash
cat fifo | while read line
do
  echo $line
done

That way, it will read an entire line or nothing.

The problem with the other version is that the responsibility of reading from the fifo was for the built-in read, which uses a buffer of 1 character for reading, so different characters of the same line can be read by two process if they are running simultaneously. You can see it with strace:

strace bash -c 'while read line; do echo $line; done < fifo'`

cat uses a bigger buffer to read, so it ends up receiving an entire line. Test it with:

strace cat fifo | while read line; do echo $line; done

However, I do not recommend using it as a job queue, as it does not seem to distribute reads evenly across readers.

Alvaro Gutierrez Perez
  • 3,669
  • 1
  • 16
  • 24
  • 1
    Also cat will read everything in STDIN, so will empty an entire multiple-line queue if present. All the single-line commands (head, sed, awk) exit after reading, closing the pipe. Is there a command to continually read an entire line at a time (if only head had a -f --follow option like tail)? – user3769065 Oct 22 '15 at 03:20
  • It is impossible to read _just_ a single line if there are more present in the buffer, as the raw `read()` operation works by reading a given number of bytes, not by reading _until_ some kind of character is found. The programs that read only a line do it by reading one character at a time (as I explained in the answer for the built-int `read`), so they will interleave with others, causing the problem you were trying to solve. – Alvaro Gutierrez Perez Oct 22 '15 at 12:54