6

back_inserter and insert_iterator are very handy, but they're also very inefficient!

When you're appending chars, for example, there is a great deal of overhead for every element when you're copying, when in many situations, there really doesn't need to be.

Is there a way to make them more efficient?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • So what's the performance difference (in %, on your machine)? Presumably the main overhead per character is checking `size` against `capacity` and incrementing `size`? – Steve Jessop Sep 08 '12 at 02:09
  • @SteveJessop: Hmm, what do you expect it to be? I get a 5x difference. Compare [**this**](http://ideone.com/TaXri) with [**this**](http://ideone.com/sGrjg). – user541686 Sep 08 '12 at 02:43
  • @SteveJessop: Yes, that's a big part of it -- but there's more: it's also an extra pointer dereference per character. – user541686 Sep 08 '12 at 02:51
  • I don't have a particular expectation, I'm just shocked and appalled that you've claimed an optimization without supplying benchmark figures ;-) What optimization flags does ideone use? On my machine with `-O3` the difference between your two programs is slightly less dramatic, but still a factor of 3. I haven't run any code that uses your actual version of `copy`. – Steve Jessop Sep 08 '12 at 02:51
  • @SteveJessop: Well, to be honest, I was shocked that it wasn't obvious enough to avoid the need for me to show a benchmark. ;) I don't know what IDEOne uses, but a factor of 3 is still pretty darn bad. On my machine (VC++ 10 with /O2), it's a factor of 10 difference, though -- 60 ms against 600 ms. That's a pretty nasty slowdown. – user541686 Sep 08 '12 at 02:53
  • @SteveJessop: FYI, I also get 60ms with my answer on VC++, so it's exactly the same as calling `reserve` -- a 10-fold improvement. – user541686 Sep 08 '12 at 02:58
  • Heh. I guess the thing is that if someone's selling me an optimization then I want to know whether it's actually an impressive one, or just some rubbishy 10%. And as you'd expect, in your benchmarks `y.insert(y.end(), x.begin(), x.end())` performs the same for me as the fast one. I suspect it's only when you have a generic function that calls `copy` that you care about the performance of calling `copy` with an `insert_iterator`, because you wouldn't normally do that deliberately. Since your code genuinely delivers the different between the two, fair enough. – Steve Jessop Sep 08 '12 at 02:58
  • @SteveJessop: "You don't normally do that deliberately" ...huh? I most certainly did. I called a function and passed it an output iterator. It calls `std::copy` internally. That's about as "normal" as I can imagine the code to possibly be. – user541686 Sep 08 '12 at 03:01
  • That's what I said, "it's only when you have a generic function that calls `copy`". You *didn't* deliberately call `copy` with an `insert_iterator`, you called something that turned out to be implemented that way. Not a major point other than that your toy examples aren't real -- with a bit more work we could produce a realistic usage, and maybe even include your actual solution rather than a one-liner that claims to simulate your actual solution ;-p – Steve Jessop Sep 08 '12 at 03:02
  • @Mehrdad: Are you using checked or unchecked iterators? – David Rodríguez - dribeas Sep 08 '12 at 03:06
  • @DavidRodríguez-dribeas: Unchecked. Checked iterators are "only" slower by a factor of ~2 instead of 10. – user541686 Sep 08 '12 at 03:15
  • Another question, the test was with the code in ideone or with the proposed optimized code below? – David Rodríguez - dribeas Sep 08 '12 at 03:20
  • @DavidRodríguez-dribeas: Both; same results. For your benchmarking pleasure, here's the example for the one below: compare [**this**](http://ideone.com/pNhSw) with [**this**](http://ideone.com/4nqOC). (On my computer, it's 59 ms against 632 ms with VC++ 10.) – user541686 Sep 08 '12 at 03:24
  • BTW, replacing your `copy` with the constructor taking two iterators is faster than your modified algorithm. – David Rodríguez - dribeas Sep 08 '12 at 03:52
  • @DavidRodríguez-dribeas: Yeah, and deleting the loop makes it *even faster*! How did I not think of that?! – user541686 Sep 08 '12 at 03:55
  • ... and an empty main is even faster, but the difference is that the two iterator constructor and the three iterator `std::vector<>::insert` are both faster (even without the explicit `reserve`), and neither of those options causes undefined behavior. As @SteveJessop, the only advantage would be in generic code, but your algorithm *requires* that the iterator is an insert iterator into a vector, which is not *that* generic. As a side note, it is quite simple to correct the UB in your code, but it requires moving the algorithm outside of `std`... – David Rodríguez - dribeas Sep 08 '12 at 16:32
  • @David: I was thinking about this whole setup, and how much more convenient it would be if there were some concept of a "range" in C++ rather than iterators, and if the equivalent of `back_inserter` therefore returned a range that was aware of the fact that it's backed by a Sequence, and if `copy` was an assignment of one range to another, and that `back_range` therefore could naturally `insert()` when assigned to. If only someone had advocated ranges in preference to iterators, in time for C++11 consideration ;-) – Steve Jessop Sep 08 '12 at 17:54
  • @SteveJessop: AFAIK it was discussed but they did not reach a consensus, or maybe it was too late and they decided to leave that aside for a later time. I know of three similar although different *proposals* (articles, not real proposals to the standard) on improving algorithms. One coming from boost (iterator/range libraries), one by Alexandrescu (I think the article is called *On iteration*) and another from Dietmar Kühl (I have read a draft, he has the implementation in his page. Andrei proposes the same thing you do: using algorithms directly on ranges, as a primitive [...] – David Rodríguez - dribeas Sep 08 '12 at 18:02
  • [...] Dietmar, on the other hand believes that the primitives must still be iterators, but that the library should be more generic and accept ranges (that would be converted internally into two iterators). His main focus is the presence of *property maps*, which are functors that map from the stored object to what the algorithm uses, so for example, you could pass a `_2nd` to `sort` when dealing with a vector of pairs so that before evaluating the predicate it would extract the `.second` field. I can try and locate the articles. – David Rodríguez - dribeas Sep 08 '12 at 18:05

1 Answers1

1

Yes, you can define a new version of std::copy which can hijack optimizable calls. :)

Below is an example (or "hack", if you prefer to see the glass half-empty) for Visual C++ and GCC.

On my personal computer (I use VC++ 2010), the code below makes calls ten times faster!
A benchmark for GCC's is also here, showing a 5x difference: old version against new version

But before you use it:

Note that this code assumes the container provides a vector-like interface.

As currently written, this only works for C++11, because it uses the type_traits header's metaprogramming capabilities to only optimize those situations in which the copy operation would stay exception-safe.

If you don't need the exception safety (though you should think twice before actually doing this), or if you have another way of checking for such data types, then you can change

typename enable_if<..., typename insert_iterator<C> >::type

to:

insert_iterator<C>

and the rest of the code should work for C++03 as well.

namespace std
{
    template<class FwdIt, class C>
    back_insert_iterator<C> copy(
        FwdIt begin, FwdIt end, back_insert_iterator<C> it,
        forward_iterator_tag * =
          static_cast<typename iterator_traits<FwdIt>::iterator_category *>(0))
    {
        struct It : public back_insert_iterator<C>
        {
            using back_insert_iterator<C>::container;
            static C &deref(C &c) { return  c; }
            static C &deref(C *c) { return *c; }
        };
        copy(begin, end, inserter(It::deref(static_cast<It &>(it).container),
                      It::deref(static_cast<It &>(it).container).end()));
        return it;
    }

    template<class FwdIt, class C>
    typename enable_if<  // Only do this if it would be exception-safe!
        is_nothrow_copy_constructible<typename C::value_type>::value &&
        is_nothrow_copy_assignable<typename C::value_type>::value,
        insert_iterator<C>
    >::type copy(
        FwdIt const &begin, FwdIt const &end,
        insert_iterator<C> output,
        forward_iterator_tag * =                  // only forward iterators
          static_cast<typename iterator_traits<FwdIt>::iterator_category *>(0))
    {
        struct It : public insert_iterator<C>
        {
            using insert_iterator<C>::container;  // protected -> public
            using insert_iterator<C>::iter;       // protected -> public
            static C &deref(C &c) { return  c; }
            static C &deref(C *c) { return *c; }
        };
        It &it(static_cast<It &>(output));
        typename C::iterator it_iter_end(It::deref(it.container).end());
        {
            // Convert iterators to offsets
            typename C::size_type const iter_end_off =
                std::distance(It::deref(it.container).begin(), it_iter_end);
            typename iterator_traits<typename C::iterator>::difference_type off
                = std::distance(It::deref(it.container).begin(), it.iter);

            // Resize container
            It::deref(it.container).resize(
                It::deref(it.container).size() +
                static_cast<typename C::size_type>(std::distance(begin, end)));
            
            // Renormalize, in case invalidated
            it.iter = It::deref(it.container).begin();
            std::advance(it.iter, off);
            it_iter_end = It::deref(it.container).begin();
            std::advance(it_iter_end, iter_end_off);
        }
        typename C::iterator result
          = copy_backward(it.iter, it_iter_end, It::deref(it.container).end());
        copy_backward(begin, end, result);
        return inserter(It::deref(it.container), result);
    }
}
Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • 2
    You are not allowed to add those specializations inside the `std` namespace. You can use the same code but it must be outside of that namespace. The cast `static_cast(output)` can also be undefined behavior depending on the type of `output`. – David Rodríguez - dribeas Sep 08 '12 at 01:37
  • @DavidRodríguez-dribeas, I don't know who's right, but this: *Also, everything in the std namespace is reserved. (You are allowed to add template specializations, though.)* came from [this answer](http://stackoverflow.com/a/228797/962089). – chris Sep 08 '12 at 02:12
  • 1
    @chris: You are allowed to add specializations of standard templates for *your own user defined types*, not just *any type*. Besides that, there is no such thing as a function template partial specializations, the code above are *overloads* of `std::copy`, not *specializations* of the function template. – David Rodríguez - dribeas Sep 08 '12 at 02:14
  • @DavidRodríguez-dribeas, Good points. Thanks for elaborating. – chris Sep 08 '12 at 02:16
  • 1
    @chris: 17.4.3.1p1 *A program may add template specializations for any standard library template to namespace std. Such a specialization (complete or partial) of a standard library template results in undefined behavior unless the declaration depends on a user-defined name of external linkage and unless the specialization meets the standard library requirements for the original template.* – David Rodríguez - dribeas Sep 08 '12 at 02:30
  • @DavidRodríguez-dribeas: Like I said, feel free to call it a "hack", if you'd rather see the glass half-empty. – user541686 Sep 08 '12 at 02:30
  • @Mehrdad: I just wanted to point out that this *hack* is Undefined Behavior. If you are willing to undertake or not is a different question, but it should be stated. Also note that depending on the type in the container, this might actually hit a performance penalty, for example for *movable* types the cost of your hack will be greater than what most standard libraries offer. Read *movable* in a wide sense, Dinkumware has been *moving* objects for a long time before *rvalue-references* were in the language. – David Rodríguez - dribeas Sep 08 '12 at 02:34
  • ... different libraries have optimizations for bitwise movable types that you are missing in your implementation. Note that it is an interesting hack, but all considerations should be stated. – David Rodríguez - dribeas Sep 08 '12 at 02:38
  • @DavidRodríguez-dribeas: Yes, and indeed, I *did* quite clearly state it's a hack, because I knew you (or someone else) was going to shoot down the idea if I didn't... though looks like it didn't matter anyway. :\ – user541686 Sep 08 '12 at 02:49
  • 1
    @Mehrdad, But annotating an answer with useful information is not "shooting down the idea". Even if it is stated in the answer, assume it as an emphasis. – cartoonist Jun 11 '17 at 13:02
  • @cartoonist: This post is 5 years old, but for the sake of not ignoring you, I'll reply: there's a group of programmers on StackOverflow (notably many C++ ones) that tend to nitpick on the stupidest and most utterly irrelevant technicalities. It's impossible to satisfy them with non-ISO-standard-blessed code in your answer. No matter how useless or pointless it is, one of them *will* leave a comment to just to sling mud on your answer and show how smart they are, and every damn time you have to defend yourself to avoid starting a chain of downvotes into oblivion. I've gotten sick of that. – user541686 Jun 11 '17 at 13:31
  • @Mehrdad, apparently you know this community better. For me, as a reader of this question and answer individually, it was useful that there's a comment shedding light on this fact that the *hack* you mentioned in your answer actually means that it is not *specialization* (which is allowed in ISO) but function *overloading*. – cartoonist Jun 11 '17 at 17:27
  • @cartoonist: I guess it's good that someone finally found it useful after 5 years! =P – user541686 Jun 11 '17 at 22:09