33

Given a bash command line of the form

commandA | commandB

I want to add a buffer of size ~1MB that sits between commandA and commandB. I would expect to be able to do this with something of the form

commandA | BUFFER | commandB

but what is the command to use for BUFFER?

Remark: I want to do this in order to decouple the two commands to make them parallelize better. The problem is that commandB processes data in large chunks, which currently means that commandA blocks until commandB is done with a chunk. So everything runs sequentially :-(

Alex Krauss
  • 9,438
  • 4
  • 27
  • 31

6 Answers6

30

BUFFER is called buffer. (man 1 buffer, maybe after apt-get install buffer)

Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
  • What are the options one should pass to buffer to get a, eg. 1Gb buffer size ? There seem to be many options to configure chunk size, number of chunks and whatever, but it is not clear to me which ones to use to get a simple buffer of a given size. – Suzanne Soy Jun 23 '13 at 12:41
  • 2
    `-m size` is for the total. If you need finer-grained control use `-s blocksize -b blocks` together. – Eugen Rieck Jun 23 '13 at 19:23
25

There is another tool, pv - pipe viewer:

process1 | pv -pterbTCB 1G | process2
  • B specifies the buffer size, here 1 Gigibyte
  • C disables splice, which is required for B
  • T shows the buffer level
  • pterb are the default display switches needed due to the presence of T

pv might be available on systems where mbuffer/buffer is not in the official repositories (such as arch linux).

JanKanis
  • 6,346
  • 5
  • 38
  • 42
Johannes Gerer
  • 25,508
  • 5
  • 29
  • 35
  • 1
    I tried using `pv` as buffer, but it sometimes stops reading its input when its output blocks while the buffer is not full. It appears to be a bug, as attaching with `strace` fixes it. `pv` with a buffer also switches between reading and writing every 100 ms or until no more data is ready, but that slowed down reading from disk when I tried it. I guess the 100 ms gap was too long for the system to continue reading ahead. – JanKanis Mar 29 '21 at 15:15
8

You can use

  • buffer (mentioned)
  • mbuffer (works on solaris too, possibly other UNIXes)

E.g.

    process1 | mbuffer -m 1024M | process2

to use a 1G buffer

sehe
  • 374,641
  • 47
  • 450
  • 633
  • mbuffer seems much better than buffer since buffer is (according to manpage and my experiments) limited to 1GB. – phihag Sep 30 '17 at 09:18
4

The program buffer uses shared memory. This might be a problem, because in case of an error, memory may leak, because shared memory can outlive the program, which allocated the memory.

An alternative may be GNU dd:

commandA |
dd status=none iflag=fullblock bs=1M |
commandB

It is important to use the fullblock option. Otherwise dd may cause data loss, when reading from a pipe.

Parameters of dd explained

  • status=none

    Set the level of information to print to stderr; 'none' suppresses everything but error messages

  • iflag=fullblock

    accumulate full blocks of input

  • bs=1M

    read and write up to one Mega bytes at a time (default: 512 bytes);

Community
  • 1
  • 1
ceving
  • 21,900
  • 13
  • 104
  • 178
0

There is a tool called stdbuf that lets you specify the buffer size of the pipe, something like:

stdbuf -o 1M commandA | commandB
moritz
  • 12,710
  • 1
  • 41
  • 63
-4

alternatively you could use a named pipe and run them in parallel:

mkfifo myfifo
commandB < myfifo &
commandA > myfifo
rm myfifo
Samus_
  • 2,903
  • 1
  • 23
  • 22