What's the purpose of Boost pipe and why it's important?

Question

Apologies if this question is overly broad. I'm new to C++ and trying to understand different stream types and why they matter (or doesn't matter).

I'm learning by coding a simple program that launch a child process, and process the output. I'm following the Boost process synchronous IO example: https://www.boost.org/doc/libs/1_75_0/doc/html/boost_process/tutorial.html#boost_process.tutorial.io.

One of the example can be reduce to this:

#include <boost/process.hpp>

using namespace std;
using namespace boost::process;

int main(int argc, char *argv[]) {
  opstream in;
  ipstream out;

  child c("c++filt", std_out > out, std_in < in);

  in << "_ZN5boost7process8tutorialE" << endl;

  in.pipe().close(); // This will help c++filt quit, so we don't hang at wait() forever

  c.wait();
  return 0;
}

My question is:

Why do we have to use a boost opstream? Can I use istringstream instead (besides that it doesn't compile)? Can make it compile with istringstream?

Boost document said:

Boost.process provides the pipestream (ipstream, opstream, pstream) to wrap around the pipe and provide an implementation of the std::istream, std::ostream and std::iostream interface.

Does being a pipe matter, i.e. does pipe have significant implication here?

If you're looking to send just a string/buffer as input, look into the links at the bottom of my answer. — sehe, Feb 07 '21 at 23:05

score 4 · Accepted Answer · answered Feb 07 '21 at 23:02

What Are Processes, How Do They Talk?

Programs interact with their environment in various ways. One set of channels are the standard input, output and error streams.

These are often tied to a terminal or files by a shell (cmd.exe, sh, bash etc).

Now if programs interact with eachother, like:

ls | rev

to list files and send the output to another program (rev, which reverses each line), this is implemented with pipes. Pipes are an operating system feature, not a boost idea. All major operating systems have them.

Fun fact: the | operator used in a most shells to indicate this type of output/input redirection between processes is called the PIPE symbol.

What Is A Pipe, Then?

Pipes are basically "magic" file-descriptors that refer to an "IO channel" rather than a file. Pipes have two ends: One party can writes to one end, the other party reads from the other.

Why?

Two reasons that come to mind right away

Files require disk IO and syncing, making it slow

Another fun fact: MSDOS has implemented pipes in terms of temporary files (on disk) for a very long time:

MS-DOS 2.0 introduced the ability to pipe the output of one program as the input of another. Since MS-DOS was a single-tasking operating system, this was simulated by redirecting the first program’s output to a temporary file and running it to completion, then running the second program with its input redirected from that temporary file. Now all of a sudden, MS-DOS needed a location to create temporary files! For whatever reason, the authors of MS-DOS chose to use the TEMP variable to control where these temporary files were created.
The pipe enables asynchronous IO. This can be important in case processes have two-way (full duplex) IO going on.

Okay Do I Care?

Yes, no, maybe.

You mostly don't. The ipstream/opstream classes are 100% compatible with std::istream/std::ostream, so if you had a function that expects them:

void simulate_input(std::ostream& os)
{
    for (int i = 0; i < 10; ++i) {
        os << "_ZN5boost7process8tutorialE" << std::endl;
    }
}

You can perfectly use it in your sample:

bp::opstream in;
bp::ipstream out;

bp::child c("c++filt", bp::std_out > out, bp::std_in < in);

simulate_input(in);
in.close();

c.wait();

When You Definitely Need It

In full-duplex situations where you could easily induce a deadlock where both programs are waiting for input from the other end because they're doing the IO synchronously.

You can find examples + solution here:

I was thinking streams are already like “pipes”, that they can buffer data, and data can be put in one end and take out the other end. I wasn’t sure why there are istream, ostream, and iostream because I imagine with any stream it need to be able to accept data in one end and get data out on the other end? — Daiwei, Feb 08 '21 at 00:16
That's just how the standard interfaces are broken out. Which is nice for expressing contract: now you know when you pass a stream it will only be written/read. Note that neither istream nor ostream implement buffering. That's done by the underlying buffer, of which there are no in/out flavours - precisely like for pipe/async_pipe — sehe, Feb 08 '21 at 13:12
In effect think of the pipe as the source(or buffer) and think of istream/ostream as providing high-level text stream interface on top of them (locale, number formatting etc) — sehe, Feb 08 '21 at 13:14