Add a big buffer to a pipe between two commands

Question

Given a bash command line of the form

commandA | commandB

I want to add a buffer of size ~1MB that sits between commandA and commandB. I would expect to be able to do this with something of the form

commandA | BUFFER | commandB

but what is the command to use for BUFFER?

Remark: I want to do this in order to decouple the two commands to make them parallelize better. The problem is that commandB processes data in large chunks, which currently means that commandA blocks until commandB is done with a chunk. So everything runs sequentially :-(

score 30 · Accepted Answer · answered Dec 18 '11 at 20:32

30

BUFFER is called buffer. (man 1 buffer, maybe after apt-get install buffer)

answered Dec 18 '11 at 20:32

Eugen Rieck

64,175
10
70
92

What are the options one should pass to buffer to get a, eg. 1Gb buffer size ? There seem to be many options to configure chunk size, number of chunks and whatever, but it is not clear to me which ones to use to get a simple buffer of a given size. – Suzanne Soy Jun 23 '13 at 12:41
2

`-m size` is for the total. If you need finer-grained control use `-s blocksize -b blocks` together. – Eugen Rieck Jun 23 '13 at 19:23

score 25 · Answer 2 · edited Mar 29 '21 at 16:11

25

There is another tool, pv - pipe viewer:

process1 | pv -pterbTCB 1G | process2

B specifies the buffer size, here 1 Gigibyte
C disables splice, which is required for B
T shows the buffer level
pterb are the default display switches needed due to the presence of T

pv might be available on systems where mbuffer/buffer is not in the official repositories (such as arch linux).

edited Mar 29 '21 at 16:11

JanKanis

6,346
5
38
42

answered Oct 13 '16 at 09:32

Johannes Gerer

25,508
5
29
35

1

I tried using `pv` as buffer, but it sometimes stops reading its input when its output blocks while the buffer is not full. It appears to be a bug, as attaching with `strace` fixes it. `pv` with a buffer also switches between reading and writing every 100 ms or until no more data is ready, but that slowed down reading from disk when I tried it. I guess the 100 ms gap was too long for the system to continue reading ahead. – JanKanis Mar 29 '21 at 15:15

score 8 · Answer 3 · answered Dec 18 '11 at 21:34

8

You can use

buffer (mentioned)
mbuffer (works on solaris too, possibly other UNIXes)

E.g.

    process1 | mbuffer -m 1024M | process2

to use a 1G buffer

answered Dec 18 '11 at 21:34

sehe

374,641
47
450
633

mbuffer seems much better than buffer since buffer is (according to manpage and my experiments) limited to 1GB. – phihag Sep 30 '17 at 09:18

score 4 · Answer 4 · edited Jun 20 '20 at 09:12

The program buffer uses shared memory. This might be a problem, because in case of an error, memory may leak, because shared memory can outlive the program, which allocated the memory.

An alternative may be GNU dd:

commandA |
dd status=none iflag=fullblock bs=1M |
commandB

It is important to use the fullblock option. Otherwise dd may cause data loss, when reading from a pipe.

Parameters of dd explained

status=none

Set the level of information to print to stderr; 'none' suppresses everything but error messages
iflag=fullblock

accumulate full blocks of input
bs=1M

read and write up to one Mega bytes at a time (default: 512 bytes);

score 0 · Answer 5 · answered Jun 12 '23 at 13:15

0

There is a tool called stdbuf that lets you specify the buffer size of the pipe, something like:

stdbuf -o 1M commandA | commandB

answered Jun 12 '23 at 13:15

moritz

12,710
1
41
63

score -4 · Answer 6 · answered Dec 18 '11 at 21:40

-4

alternatively you could use a named pipe and run them in parallel:

mkfifo myfifo
commandB < myfifo &
commandA > myfifo
rm myfifo

answered Dec 18 '11 at 21:40

Samus_

2,903
1
23
22

1

That buffers about 64KB only - not big. – Volker Siegel Aug 04 '14 at 16:08
2

This solution has the same buffer size as `commandA | commandB`. – mik May 23 '18 at 13:21

Add a big buffer to a pipe between two commands

6 Answers6

Linked