Stdio, cin and cout: Programs for use in unix pipes (like grep, sort, etc)

Question

I want to write programs that behave like unix utlities. In particular, I want to use them with pipes, e.g.:

grep foo myfile | ./MyTransformation [--args] | cut -f2 | ...

Three aspects make me wonder how to handle I/O:

According to scources like Useless Use of Cat Award, it would be good to support both, reading from stdin and reading from a file (in the beginning of a pipeline). How is this best accomplished? I'm used to using the <getopt.h> / <cgetopt> stuff for parsing arguments. I could see if there is another file argument besides my options and read from it. If not, read from stdin. That would mean that stdin is ignore if an inut file is supplied. Is this desireable?
According to this question, C++ synchronizes cout and cin with stdio and hence does not buffer well. This leads to a huge decrease in performance. A solution is to disable synchronization: cin.sync_with_stdio(false);. Should a program for use in pipes always disable synchronization with stdio for cin and cout? Or should it avoid using cin and cout and instead use their own form of buffered io?
Since cout will be used for program output (unless an output file is specified), status messages (verbosity like % done) have to go somewhere else. cerr/stderr seems like an obvious choince. However, status are no errors.

In summary, I wonder about the io ahndling of such programs in c++. Can cin and cout be used despite the problems addressed above? Should I/O be handled differently? For example, reading and writing from/to buffered files wheres stdin and stdout are default files? What would be the recommended way to implement such a behavior?

score 2 · Accepted Answer · answered Sep 13 '13 at 13:16

The standard idiom if there are no options is:

int returnCode = 0;

void
processFile( std::string const& filename )
{
    if ( filename == "-" ) {
        process( std::cin );
    } else {
        std::ifstream in( filename.c_str() );
        if ( !in.is_open() ) {
            std::cerr << argv[0] << ": cannot open " << filename << std::endl;
            returnCode = 1;
        } else {
            process( in );
        }
    }
}

int
main( int argc, char** argv )
{
    if ( argc == 1 ) {
        processFile( "-" );
    } else {
        for ( int i = 1; i != argc; ++ i ) {
            processFile( argv[i] );
        }
    }
    std::cout.flush()
    return std::cout ? returnCode : 2;
}

There are many variants, however. I found myself doing this so often that I wrote a MultiFileInputStream class whose (template> constructor takes a pair of iterators; it then executes more or less the same code as the above. (All of the significant code is, as usual, in the corresponding streambuf.) Similarly, I have a class to parse out the options (which looks like an immutable std::vector<std::string> once the options have been parsed. So the above would become:

int
main( int argc, char** argv )
{
    CommandLine& args = CommandLine::instance();
    args.parse( argc, argv );
    MultiFileInputStream src( args.begin(), args.end() );
    process( src );
    return ProgramStatus::instance().returnCode();
}

(ProgramStatus is another useful class, which handles error output, and the return code. And flushes std::cout and adjusts the error code when you call returnCode() on it.)

I'm sure that anyone writing Unix filter programs has developed similar classes.

With regards to question 2: sync_with_stdio is a static member of std::ios_base, so you can call it without an object: std::ios_base::sync_with_stdio( false );. I find this less misleading, since the call will affect all iostream objects. If the IO handling is a blocking point, by all means do it, but most of the time, I don't bother. It's rare for such programs to need any sort of optimization. (Note that if you do call sync_with_stdio, then you should not use any C style IO. But I can't see any reason to use it anyway.)

With regards to question 3: error messages go to std::cerr, always. You also want to be sure to return a non-zero return code, even if the error wasn't fatal. Something like:

myprog file1 > tmp && mv tmp file1

is all to common, and if you have some problem, and don't generate the output, it's a disaster if you don't return a non-zero error code. (That's why I always flush and then check the status of std::cout. A long, long time ago, a user of my program did the above, with a very large file, and the disk was full. It wasn't quite as full afterwards. Since then: always flush std::cout, and check that it worked, before returning OK.)

Perfectly answers my questions 1 & 2. Regarding question 3 i was thinking about "verbosity" like warnings or percentages how much of the file is done. I have seen programs "abuse" stderr for that but I am not sure if this is good practice or if I should avoid such output at all — b.buchhold, Sep 13 '13 at 14:03
@b.buchhold By default, Unix filter programs are not verbose. Most of the time, I'd either avoid such output completely, or make it depend on an option (in which case, it goes to `std::cerr`). In some cases, you might make it dependent on `isatty`, in which case, `std::cout` might be appropriate too (since `isatty` will be false if you're outputting to a pipe). But the general rule is to say nothing as long as everything is OK. — James Kanze, Sep 13 '13 at 14:26

score -1 · Answer 2 · answered Sep 13 '13 at 10:32

-1

Are you sure you want to use C++? Most operating systems rely more on C and assembly than C++. If you're going to write apps then C++ could be a good choice, but for operating system and its utilities, shell and helper programs, they're usually coded in C. You can look through your Linux or BSD implementation to see how it is done with pipes, standard input and standard output. If you think that C is something for you, you could read the C book "THe C programming language" by Kernighan and Richie, there you have many examples how to write a good C program that uses pipes, std i/o and arguments.

answered Sep 13 '13 at 10:32

Niklas Rosencrantz

25,640
75
229
424

I don't want to write OS utilities. Instead, I want to write applications that work on huge text files (up to a TB). Most of that is I/O bound (parallelization not much of an issue) and can be done sequentially. So far I use several programs that do the transformation given an input and output file (organized in a Makefile with dependencies), but I realized and pipes (and tee) would suit my chain of action very well. My trasnformations are not trivial and C++ including stl templates like vector or google's dense hashmap are very useful – b.buchhold Sep 13 '13 at 10:36
Parts of the kernel of an OS have to be in assembler. But there's never any reason to use C today. (In kernel code, you might not want to use all of the features of C++, but he's obviously not programming in kernel code if he's talking about `std::cin` or `stdin`.) And of course, most of the utilities in Linux aren't really Linux, but GNU, and were written before C++ was readily available. – James Kanze Sep 13 '13 at 13:18
This does not really answer the question. Also the opinion of "using `C` is better than `C++` for OS and its utilities" is unfounded. For example, GCC: http://beta.slashdot.org/story/173381 – WiSaGaN Apr 25 '14 at 03:03

Stdio, cin and cout: Programs for use in unix pipes (like grep, sort, etc)

2 Answers2