3

When I cut my teeth on C++03, I learned several approaches to writing a "give me the collection of things" function. But each has some setbacks.

template< typename Container >
void make_collection( std::insert_iterator<Container> );

or:

void make_collection( std::vector<Thing> & );
  • This is not container agnostic
  • The interface doesn't communicate that an empty container is expected.

or:

std::vector<Thing> make_collection();
  • This is not container agnostic
  • There are several avenues for unnecessary copying. (Wrong container type, wrong contained type, no RVO, no move semantics)

Using modern C++ standards, is there a more idiomatic function interface to "produce a populated container"?

Community
  • 1
  • 1
Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
  • If you want to be able to insert something into every container, including ones not visible from the .cpp file, some code must be visible in the header. That is the only spot that has information both about the target container, and the stuff to put into the container. Type erasure can be used to decouple those (and put content generating code in the .cpp file, and just glue code in the header), but that costs efficiency at runtime. What are your priorities? I mean, you can ask for cakes that remain in your hand after you swallow them, but you might not get it. – Yakk - Adam Nevraumont Mar 05 '15 at 18:09
  • The container-agnostic way of handling things lite this have always been to use a range from a pair of iterators. And of course there's always the inserter [iterator adapters](http://en.cppreference.com/w/cpp/iterator), but it's no longer completely container agnostic using them (as they can't handle e.g. container adaptors). – Some programmer dude Mar 05 '15 at 18:12
  • @JoachimPileborg that doesn't support *adding* elements? – Yakk - Adam Nevraumont Mar 05 '15 at 18:13

4 Answers4

7

The first approach is type erasure based.

template<class T>
using sink =  std::function<void(T&&)>;

A sink is a callable that consumes instances of T. Data flows in, nothing flows out (visible to the caller).

template<class Container>
auto make_inserting_sink( Container& c ) {
  using std::end; using std::inserter;
  return [c = std::ref(c)](auto&& e) {
    *inserter(c.get(), end(c.get()))++ = decltype(e)(e);
  };
}

make_inserting_sink takes a container, and generates a sink that consumes stuff to be inserted. In a perfect world, it would be make_emplacing_sink and the lambda returned would take auto&&..., but we write code for the standard libraries we have, not the standard libraries we wish to have.

Both of the above are generic library code.

In the header for your collection generation, you'd have two functions. A template glue function, and a non-template function that does the actual work:

namespace impl {
  void populate_collection( sink<int> );
}
template<class Container>
Container make_collection() {
  Container c;
  impl::populate_collection( make_inserting_sink(c) );
  return c;
}

You implement impl::populate_collection outside the header file, which simply hands over an element at a time to the sink<int>. The connection between the container requested, and the produced data, is type erased by sink.

The above assumes your collection is a collection of int. Simply change the type passed to sink and a different type is used. The collection produced need not be a collection of int, just anything that can take int as input to its insert iterator.

This is less than perfectly efficient, as the type erasure creates nearly unavoidable runtime overhead. If you replaced void populate_collection( sink<int> ) with template<class F> void populate_collection(F&&) and implemented it in the header file the type erasure overhead goes away.

std::function is new to C++11, but can be implemented in C++03 or before. The auto lambda with assignment capture is a C++14 construct, but can be implemented as a non-anonymous helper function object in C++03.

We could also optimize make_collection for something like std::vector<int> with a bit of tag dispatching (so make_collection<std::vector<int>> would avoid type erasure overhead).


Now there is a completely different approach. Instead of writing a collection generator, write generator iterators.

The first is an input iterator that call some functions to generate items and advance, the last is a sentinal iterator that compares equal to the first when the collection is exhasted.

The range can have an operator Container with SFINAE test for "is it really a container", or a .to_container<Container> that constructs the container with a pair of iterators, or the end user can do it manually.

These things are annoying to write, but Microsoft is proposing Resumable functions for C++ -- await and yield that make this kind of thing really easy to write. The generator<int> returned probably still uses type erasure, but odds are there will be ways of avoiding it.

To understand what this approach would look like, examine how python generators work (or C# generators).

// exposed in header, implemented in cpp
generator<int> get_collection() resumable {
  yield 7; // well, actually do work in here
  yield 3; // not just return a set of stuff
  yield 2; // by return I mean yield
}
// I have not looked deeply into it, but maybe the above
// can be done *without* type erasure somehow.  Maybe not,
// as yield is magic akin to lambda.

// This takes an iterable `G&& g` and uses it to fill
// a container.  In an optimal library-class version
// I'd have a SFINAE `try_reserve(c, size_at_least(g))`
// call in there, where `size_at_least` means "if there is
// a cheap way to get the size of g, do it, otherwise return
// 0" and `try_reserve` means "here is a guess asto how big
// you should be, if useful please use it".
template<class Container, class G>
Container fill_container( G&& g ) {
  Container c;
  using std::end;
  for(auto&& x:std::forward<G>(g) ) {
    *std::inserter( c, end(c) ) = decltype(x)(x);
  }
  return c;
}
auto v = fill_container<std::vector<int>>(get_collection());
auto s = fill_container<std::set<int>>(get_collection());

note how fill_container sort of looks like make_inserting_sink turned upside down.

As noted above, the pattern of a generating iterator or range can be written manually without resumable functions, and without type erasure -- I've done it before. It is reasonably annoying to get right (write them as input iterators, even if you think you should get fancy), but doable.

boost also has some helpers to write generating iterators that do not type erase and ranges.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Nitpicking: shouldn't `std::inserter` be `std::back_inserter`? Question: would there be a problem with writing the lambda's capture `[&c]` and just using `c` inside? – bogdan Mar 05 '15 at 19:08
  • @bogdan `std::back_inserter` only works with containers that provide `push_back`, so `std::inserter` is a bit more generic (think `std::map` and `std::unordered_map`). Capturing references by reference is currently underspecified (See [CWG issue 2011](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2011)). – Casey Mar 05 '15 at 19:11
  • @Casey Valid point on `std::inserter` as a way to make it work on more containers, although I'm not sure it's such a good idea. I wasn't even thinking about maps and sets being used with this code, as adding elements to them is sufficiently different to warrant a separate implementation (I would guess that using `end()` all the time as the hint for insertion in a map may be worse than using `insert` with no hint at all, and there are more reasons for insertion to fail than for, say, a vector). – bogdan Mar 05 '15 at 19:53
  • @Casey Thanks for the link to the issue. Do you know of any compiler that *doesn't* implement capture by reference as specified in the proposed resolution? I mean, you can't have references to references in C++. It's great that the wording is being clarified, sure, but has anyone encountered any issues in real code because of this? – bogdan Mar 05 '15 at 20:02
  • @bogdan I'm aware of no such compiler, and would be just as surprised as you by a compiler that didn't capture references by reference in exactly that way. If I was writing this, I'd probably skirt the issue entirely by capturing the result of `std::inserter(c, end(c))` and making the lambda `mutable`. Yakk's solution does have the advantage of always inserting at the end of the container even if other modifications invalidate iterators between calls to the lambda. – Casey Mar 05 '15 at 20:15
  • @Bogdan So, there is an extremely efficient `[&]` capture implementation that simply grabs the stack frame pointer and uses it to access variables in the context where the lambda is defined. It makes your reference capturing lambdas all of `sizeof(void*)` in size, and should be as efficient as the proposed solution, but it breaks when you capture a reference to a reference and the outer reference goes out of scope. Given that optimization, I'll code conservatively. – Yakk - Adam Nevraumont Mar 05 '15 at 21:02
  • @Casey thanks -- that is what I get for not compiling my answer. I swapped `c.end()` for `end(c)` in order to enable ADL lookup. Also fixed `inserter` in a similar way (I can think of situations where one would want an ADL based `inserter` call) – Yakk - Adam Nevraumont Mar 05 '15 at 21:18
  • @Yakk Do you know of any compiler that implements capture by reference that way? 'Cause if it didn't happen so far, I'd say it's highly unlikely that any implementation will start using that optimization in the future, given that there's an issue in review status regarding this. As a side note, you could capture a pointer to the referenced object by copy - less typing :-). – bogdan Mar 05 '15 at 21:30
  • @template Dunno. Learn python and Haskell? – Yakk - Adam Nevraumont Mar 06 '15 at 01:09
  • @bogdan nope, but I don't know what way that defect will go. It ain't in the draft standard afaik. – Yakk - Adam Nevraumont Mar 06 '15 at 01:10
  • @Yakk Indeed, it's not, but as far as I understand it's almost ready to be submitted for approval by the whole committe. Given that it specifies something that's consistent with the treatment of references everywhere else in the language, I have high hopes... – bogdan Mar 06 '15 at 09:36
  • @bog and I guess the optimization still works for non-references captured by reference – Yakk - Adam Nevraumont Mar 06 '15 at 13:03
  • @Yakk True. Also true for references to local and parameter objects, where 'following' the reference would stay inside the current stack frame. It's only references that point 'outside' that need their own member in the closure type; it costs space, but makes using those references faster (in the simpler variant, with just the frame pointer for everything, accessing a reference captured by reference would involve one more indirection). – bogdan Mar 06 '15 at 15:37
2

If we take our inspiration from the standard, pretty much anything of the form make_<thing> is going to return <thing> by value (unless profiling indicates otherwise I don't believe returning by value should preclude a logical approach). That suggests option three. You can make it a template-template if you wish to provide a bit of container flexibility (you just have to have an understanding as to whether the allowed container is associative or not).

However depending on your needs, have you considered taking inspiration from std::generate_n and instead of making a container, provide a fill_container functionality instead? Then it would look very similar to std::generate_n, something like

template <class OutputIterator, class Generator>
void fill_container (OutputIterator first, Generator gen);

Then you can either replace elements in an existing container, or use an insert_iterator to populate from scratch, etc. The only thing you have to do is provide the appropriate generator. The name even indicates that it expects the container to be empty if you're using insertion-style iterators.

Mark B
  • 95,107
  • 10
  • 109
  • 188
0

You can do this in c++11 without container copying. Move constructor will be used instead of a copy constructor.

std::vector<Thing> make_collection()
Semyon Tikhonenko
  • 3,872
  • 6
  • 36
  • 61
0

I don't think there is one idiomatic interface to produce a populated container, but it sounds like in this case you simply need a function to construct and return a container. In that case you should prefer your last case:

std::vector<Thing> make_collection();

This approach will not produce any "unnecessary copying", as long as you are using a modern C++11-compatible compiler. The container is constructed in the function, then moved via move semantics to avoid making a copy.

  • 1
    A mention of elision would improve this answer. – Yakk - Adam Nevraumont Mar 05 '15 at 18:24
  • My understanding was that copy elision was part of the return value optimization that modern compilers applied to prevent the copying of the temporary, and that with C++11 this optimization is essentially replaced by move semantics. Or is it that the compiler now elides the move assignment? – Julian McKinlay Mar 05 '15 at 19:26