16

Rationale

I try to avoid assignments in C++ code completely. That is, I use only initialisations and declare local variables as const whenever possible (i.e. always except for loop variables or accumulators).

Now, I’ve found a case where this doesn’t work. I believe this is a general pattern but in particular it arises in the following situation:

Problem Description

Let’s say I have a program that loads the contents of an input file into a string. You can either call the tool by providing a filename (tool filename) or by using the standard input stream (cat filename | tool). Now, how do I initialise the string?

The following doesn’t work:

bool const use_stdin = argc == 1;
std::string const input = slurp(use_stdin ? static_cast<std::istream&>(std::cin)
                                          : std::ifstream(argv[1]));

Why doesn’t this work? Because the prototype of slurp needs to look as follows:

std::string slurp(std::istream&);

That is, the argument i non-const and as a consequence I cannot bind it to a temporary. There doesn’t seem to be a way around this using a separate variable either.

Ugly Workaround

At the moment, I use the following solution:

std::string input;
if (use_stdin)
    input = slurp(std::cin);
else {
    std::ifstream in(argv[1]);
    input = slurp(in);
}

But this is rubbing me the wrong way. First of all it’s more code (in SLOCs) but it’s also using an if instead of the (here) more logical conditional expression, and it’s using assignment after declaration which I want to avoid.

Is there a good way to avoid this indirect style of initialisation? The problem can likely be generalised to all cases where you need to mutate a temporary object. Aren’t streams in a way ill-designed to cope with such cases (a const stream makes no sense, and yet working on a temporary stream does make sense)?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    Why `static_cast` is needed here? – n. m. could be an AI Feb 02 '12 at 11:05
  • @n.m.: The compiler can't see through the `?:`. Both sides of the `:` must be the same type. – David Schwartz Feb 02 '12 at 11:20
  • "Aren’t streams in a way ill-designed?" Yes, very much so. – Mike Seymour Feb 02 '12 at 11:25
  • @VJovic I’s not really relevant for the question but it’s just reading until it reaches the end of the stream, and storing the result in one contiguous string. – Konrad Rudolph Feb 02 '12 at 11:42
  • I guess the main issue is that C++ was not made with this style in mind. In a Haskell tool, I replaced stdin with a file stream via a recursive function when a filename was passed, but I don't think this is appropriate here. – stefaanv Feb 02 '12 at 12:18
  • @LightnessRacesinOrbit: would that work with `std::cin`? – Mike Seymour Feb 02 '12 at 12:35
  • @Mike: Just `std::move(std::cin)`. It's no harm if you catch it by rvalue ref and don't actually move it. – Xeo Feb 02 '12 at 12:37
  • *"I try to avoid assignments in C++ code completely."* Is this a recognised practice? I have never heard of this. – Chris Burt-Brown Feb 02 '12 at 13:29
  • @Chris I recognise it. ;-) But while it may not be extremely common in C++, it’s actually one of the best consequences of having the `const` keyword in the language. It allows for extremely strictly checked code. Besides, I’m certainly not the only one advocating it. – Konrad Rudolph Feb 02 '12 at 13:43
  • @ChrisBurt-Brown: It's certainly recognised. It may not be very common, but it's certainly neither unheard-of nor silly. The language is backwards: variables should be immutable by default and require `mutable` to mutate, and then programs would be far easier to write correctly; alas, that is not the way of C++. Some choose to try to remember to write `const` most of the time to fake it. – Lightness Races in Orbit Feb 02 '12 at 14:02
  • @MikeSeymour: Why would it not? – Lightness Races in Orbit Feb 02 '12 at 14:04
  • @Xeo: This is why `std::move` is such a ridiculously stupid name >.< It should be `std::get_rvalue` or something! – Lightness Races in Orbit Feb 02 '12 at 14:04
  • @Lightness: Well, you can also use `std::forward(std::cin)` in this case, which will also create an rvalue. – Xeo Feb 02 '12 at 14:33
  • @Xeo: What's the difference between the two? – Lightness Races in Orbit Feb 02 '12 at 14:38
  • @LightnessRacesinOrbit Difference between `forward` and `move`: http://thbecker.net/articles/rvalue_references/section_08.html (Essentially, `forward` returns an ordinary reference for lvalues while `move` always returns an rvalue reference). – Konrad Rudolph Feb 02 '12 at 14:47
  • Ah yes, that'd do it. Sounds like a good solution then? – Lightness Races in Orbit Feb 02 '12 at 15:14
  • @LightnessRacesinOrbit Not unless somebody explains to me how/when the memory of the resources is released (see Xeo’s answer and my comment there). RAII implies that resources must be bound to exactly one owning identifier and which identifier owns what object here isn’t clear to me. – Konrad Rudolph Feb 02 '12 at 15:28
  • @Lightness: Whenn `forward` is instantiated with a non-reference type, it's as if template argument deduction deduced `T&&` to `int&&` for an rvalue argument. As such, `forward(the_val)` will forward as an rvalue, while `forward(the_val)` will forward as an lvalue. See also [this](http://stackoverflow.com/a/8527373/500104), [this](http://stackoverflow.com/a/8860299/500104) and [this](http://stackoverflow.com/a/8862379/500104) answer. – Xeo Feb 02 '12 at 17:13
  • @Konrad: If it's not clear to you, then it certainly won't be clear to people using your code. So I'd just avoid the entire thing. – Lightness Races in Orbit Feb 02 '12 at 17:18

4 Answers4

14

Why not simply overload slurp?

std::string slurp(char const* filename) {
  std::ifstream in(filename);
  return slurp(in);
}

int main(int argc, char* argv[]) {
  bool const use_stdin = argc == 1;
  std::string const input = use_stdin ? slurp(std::cin) : slurp(argv[1]);
}

It is a general solution with the conditional operator.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • +1 An excellent solution. I'm using it a lot in Python at present, but curiously enough, it didn't occur to me to do the same in C++. – James Kanze Feb 02 '12 at 13:03
11

The solution with the if is more or less the standard solution when dealing with argv:

if ( argc == 1 ) {
    process( std::cin );
} else {
    for ( int i = 1; i != argc; ++ i ) {
        std::ifstream in( argv[i] );
        if ( in.is_open() ) {
            process( in );
        } else {
            std::cerr << "cannot open " << argv[i] << std::endl;
    }
}

This doesn't handle your case, however, since your primary concern is to obtain a string, not to "process" the filename args.

In my own code, I use a MultiFileInputStream that I've written, which takes a list of filenames in the constructor, and only returns EOF when the last has been read: if the list is empty, it reads std::cin. This provides an elegant and simple solution to your problem:

MultiFileInputStream in(
        std::vector<std::string>( argv + 1, argv + argc ) );
std::string const input = slurp( in );

This class is worth writing, as it is generally useful if you often write Unix-like utility programs. It is definitly not trivial, however, and may be a lot of work if this is a one-time need.

A more general solution is based on the fact that you can call a non-const member function on a temporary, and the fact that most of the member functions of std::istream return a std::istream&—a non const-reference which will then bind to a non const reference. So you can always write something like:

std::string const input = slurp(
            use_stdin
            ? std::cin.ignore( 0 )
            : std::ifstream( argv[1] ).ignore( 0 ) );

I'd consider this a bit of a hack, however, and it has the more general problem that you can't check whether the open (called by the constructor of std::ifstream worked.

More generally, although I understand what you're trying to achieve, I think you'll find that IO will almost always represent an exception. You can't read an int without having defined it first, and you can't read a line without having defined the std::string first. I agree that it's not as elegant as it could be, but then, code which correctly handles errors is rarely as elegant as one might like. (One solution here would be to derive from std::ifstream to throw an exception if the open didn't work; all you'd need is a constructor which checked for is_open() in the constructor body.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • +1 I like the `MultiFileInputStream` solution best. If streams API doesn't solve your problems, add a shim on top. :) – vhallac Feb 03 '12 at 11:56
  • The streams API does solve the problem. You just need to extend the implementation. (`MultiFileInputStream` inherits from `std::istream`. iostreams were designed with extension in mind, and I can't think of an application where we didn't have at least one custom `streambuf` and any number of custom manipulators.) – James Kanze Feb 03 '12 at 13:23
3

All SSA-style languages need to have phi nodes to be usable, realistically. You would run into the same problem in any case where you need to construct from two different types depending on the value of the condition. The ternary operator cannot handle such cases. Of course, in C++11 there are other tricks, like moving the stream or suchlike, or using a lambda, and the design of IOstreams is virtually the exact antithesis of what you're trying to do, so in my opinion, you would just have to make an exception.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Thanks, I didn’t know the name of this general problem (apparently I need to re-read the Dragon Book). A phi function for iostreams is indeed what I need, and moving might be an appropriate solution. Awesome, learned something interesting. – Konrad Rudolph Feb 02 '12 at 12:59
1

Another option might be an intermediate variable to hold the stream:

std::istream&& is = argc==1? std::move(cin) : std::ifstream(argv[1]);
std::string const input = slurp(is);

Taking advantage of the fact that named rvalue references are lvalues.

Xeo
  • 129,499
  • 52
  • 291
  • 397
  • I can’t wrap my head around why this is legal. How does this make sure that the destructor of the `ifstream` is called at the end of the scope? – Konrad Rudolph Feb 02 '12 at 14:40
  • @Konrad: Are you familiar with the reference-to-const rules when binding a temporary? They apply here to. The lifetime of the temporary `ifstream` object is extended just the same as if it was bound to `std::istream const&`, and the destructor is called when the reference goes out of scope. The advantage of using an rvalue ref here is, that you can modify the object as you please. – Xeo Feb 02 '12 at 17:11
  • I am familiar with that. But if that applies here then won’t `std::cin`’s destructor be called twice if it’s bound to `is` in your code? I mean, normally it wouldn’t but what exactly is the type of the conditional expression, and how does it affect the compiler’s decision whether to bind the object’s lifetime to `is`’ scope? – Konrad Rudolph Feb 02 '12 at 17:52
  • @Konrad: Only temporaries get their destructor called. It's the same as if you bind a non-temporary to a `T const&`, the destructor won't be called on that. See [this example](http://ideone.com/VHmIX). Rvalue references work just the same. – Xeo Feb 02 '12 at 19:17
  • This isn’t the same since no conditional is involved. The whole problem revolves around this: how does C++ keep track of whether it should destroy whatever binds to `is` or not? In fact, **it fails**: http://ideone.com/pfOTe (Notice that the destructor is called twice.) – Konrad Rudolph Feb 02 '12 at 19:28
  • @Konrad: That's interesting and quite unexpected for me. I have an idea why that happens, but I wonder if that can be seen as a defect. I think I'll open a follow up question and delete this answer once I got a confirmation that your read this comment. :) And I can also replicate the behaviour on Clang 3.1 and MSVC10. – Xeo Feb 02 '12 at 20:10
  • 1
    The objects destroyed are not the same. One is the original `X("a")`, the other is the newly created object, which is the target of the move operation. See http://ideone.com/s9L62 for a more complete picture. – vhallac Feb 03 '12 at 12:10
  • 1
    @vhallac: Thanks, that makes sense. Too bad though, since this answer is now *really* useless... who wants a moved-from standard input? :( – Xeo Feb 03 '12 at 12:55