5

This question starts with a bit of code, just because I think it is easier to see what I am after:

/*static*/ 
void 
Url::Split
(std::list<std::string> & url
, const std::string& stringUrl
)
{
    std::string collector;
    collector.reserve(stringUrl.length());
    for (auto c : stringUrl)
    {
        if (PathSeparator == c)
        {
            url.push_back(collector);
            collector.clear(); // Sabotages my optimization with reserve() above!
        }
        else
        {
            collector.push_back(c);
        }
    }
    url.push_back(collector);
}

In the code above, the collector.reserve(stringUrl.length()); line is supposed to reduce the amount of heap operations performed during the loop below. Each substring cannot be longer than the whole url, after all and so reserving enough capacity as I do it looks like a good idea.

But, once a substring is finished and I add it to the url parts list, I need to reset the string to length 0 one way or another. Brief "peek definition" inspection suggests to me that at least on my platform, the reserved buffer will be released and with that, the purpose of my reserve() call is compromised.

Internally it calls some _Eos(0) in case of clear.

I could as well accomplish the same with collector.resize(0) but peeking definition reveals it also calls _Eos(newsize) internally, so the behavior is the same as in case of calling clear().

Now the question is, if there is a portable way to establish the intended optimization and which std::string function would help me with that.

Of course I could write collector[0] = '\0'; but that looks very off to me.

Side note: While I found similar questions, I do not think this is a duplicate of any of them.

Thanks, in advance.

BitTickler
  • 10,905
  • 5
  • 32
  • 53

1 Answers1

4

In the C++11 standard clear is defined in terms of erase, which is defined as value replacement. There is no obvious guarantee that the buffer isn't deallocated. It might be there, implicit in other stuff, but I failed to find any such.

Without a formal guarantee that clear doesn't deallocate, and it appears that at least as of C++11 it isn't there, you have the following options:

  • Ignore the problem.
    After all, chances are that the micro-seconds incurred by dynamic buffer allocation will be absolutely irrelevant, and in addition, even without a formal guarantee the chance of clear deallocating is very low.

  • Require a C++ implementation where clear doesn't deallocate.
    (You can add an assert to this effect, checking .capacity().)

  • Do your own buffer implementation.


Ignoring the problem appears to be safe even where the allocations (if performed) would be time critical, because with common implementations clear does not reduce the capacity.

E.g., here with g++ and Visual C++ as examples:

#include <iostream>
#include <string>
using namespace std;

auto main() -> int
{
    string s = "Blah blah blah";
    cout << s.capacity();
    s.clear();
    cout << ' ' << s.capacity() << endl;
}
C:\my\so\0284>g++ keep_capacity.cpp -std=c++11

C:\my\so\0284>a
14 14

C:\my\so\0284>cl keep_capacity.cpp /Feb
keep_capacity.cpp

C:\my\so\0284>b
15 15

C:\my\so\0284>_

Doing your own buffer management, if you really want to take it that far, can be done as follows:

#include <iostream>
#include <string>
#include <vector>

namespace my {
    using std::string;
    using std::vector;

    class Collector
    {
    private:
        vector<char>    buffer_;
        int             size_;

    public:
        auto str() const
            -> string
        { return string( buffer_.begin(), buffer_.begin() + size_ ); }

        auto size() const -> int { return size_; }

        void append( const char c )
        {
            if( size_ < int( buffer_.size() ) )
            {
                buffer_[size_++] = c;
            }
            else
            {
                buffer_.push_back( c );
                buffer_.resize( buffer_.capacity() );
                ++size_;
            }
        }

        void clear() { size_ = 0; }

        explicit Collector( const int initial_capacity = 0 )
            : buffer_( initial_capacity )
            , size_( 0 )
        { buffer_.resize( buffer_.capacity() ); }
    };

    auto split( const string& url, const char pathSeparator = '/' )
        -> vector<string>
    {
        vector<string>  result;
        Collector       collector( url.length() );

        for( const auto c : url )
        {
            if( pathSeparator == c )
            {
                result.push_back( collector.str() );
                collector.clear();
            }
            else
            {
                collector.append( c );
            }
        }
        if( collector.size() > 0 ) { result.push_back( collector.str() ); }
        return result;
    }
}  // namespace my

auto main() -> int
{
    using namespace std;
    auto const url = "http://en.wikipedia.org/wiki/Uniform_resource_locator";

    for( string const& part : my::split( url ) )
    {
        cout << '[' << part << ']' << endl;
    }
}
Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • vectors have explicit wording in the standard ("No reallocation shall take place...", discussed to no end in http://stackoverflow.com/questions/18467624 ), but basic_string doesn't have an equivalent.. seems like it's just underspecified w.r.t capacity. – Cubbi May 29 '15 at 03:26
  • @Cubbi: Well then the `Collector` class above is not necessary, one can just use a `std::vector` directly. That's what I thought I remembered but I didn't care to sit down for a possible wild-goose chase in the standard, so I just wrote the code. :) – Cheers and hth. - Alf May 29 '15 at 03:31
  • @Cubbi: :( I looked up the answer you refer to, and, well, it's not there. In particular the interpretation of "No reallocation shall take place until" as a guarantee against de/re-allocation, is incompatible with swapping and with move assignment. Okay, so, bright side: `Collector` class above was needed after all, just to be painfully pedantic about things. – Cheers and hth. - Alf May 29 '15 at 03:41