std::string and multiple concatenations

Question

Let’s consider that snippet, and please suppose that a, b, c and d are non-empty strings.

    std::string a, b, c, d;
    d = a + b + c;

When computing the sum of those 3 std::string instances, the standard library implementations create a first temporary std::string object, copy in its internal buffer the concatenated buffers of a and b, then perform the same operations between the temporary string and the c.

A fellow programmer was stressing that instead of this behaviour, operator+(std::string, std::string) could be defined to return a std::string_helper.

This object’s very role would be to defer the actual concatenations to the moment where it’s casted into a std::string. Obviously, operator+(std::string_helper, std::string) would be defined to return the same helper, which would "keep in mind" the fact that it has an additional concatenation to carry out.

Such a behavior would save the CPU cost of creating n-1 temporary objects, allocating their buffer, copying them, etc. So my question is: why doesn’t it already work like that ?I can’t think of any drawback or limitation.

In C++11, the temporary can be reused thanks to rvalue references. — avakar, Mar 08 '12 at 15:04
@PlasmaHH: The complexity is hidden from the user, so not particularly bad. The main drawback is that it introduces an implicit user-defined type conversion, which would break existing code that relies on an implicit conversion from `std::string`. — Mike Seymour, Mar 08 '12 at 15:10
@MikeSeymour: That's a real answer to the stated question. The answers so far simply provide workarounds. — Benjamin Lindley, Mar 08 '12 at 15:13
This is the "keep in mind" thing which is not so easy. How can you do that for ANY number of strings without using dynamic allocation, hence anihilating the benefit of the whole scheme ? — fjardon, Mar 08 '12 at 15:23
ddoesnt this sample code only use one temporary, with N allocations and copies? — Mooing Duck, Mar 08 '12 at 15:41
@qdii: you might be interested by the `llvm::Twine` class and in general the [Expression Template](http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template) stuff. It's quite complicated for a simple case though. Also, there might be stale references issues introduced if one tries to keep the Twine/Expression Template object around. — Matthieu M., Mar 08 '12 at 15:53

Mike Seymour · Accepted Answer · 2012-03-08T16:04:59.903

why doesn’t it already work like that?

I can only speculate about why it was originally designed like that. Perhaps the designers of the string library simply didn't think of it; perhaps they thought the extra type conversion (see below) might make the behaviour too surprising in some situations. It is one of the oldest C++ libraries, and a lot of wisdom that we take for granted simply didn't exist in past decades.

As to why it hasn't been changed to work like that: it could break existing code, by adding an extra user-defined type conversion. Implicit conversions can only involve at most one user-defined conversion. This is specified by C++11, 13.3.3.1.2/1:

A user-defined conversion sequence consists of an initial standard conversion sequence followed by a user-defined conversion followed by a second standard conversion sequence.

Consider the following:

struct thingy {
    thingy(std::string);
};

void f(thingy);

f(some_string + another_string);

This code is fine if the type of some_string + another_string is std::string. That can be implicitly converted to thingy via the conversion constructor. However, if we were to change the definition of operator+ to give another type, then it would need two conversions (string_helper to string to thingy), and so would fail to compile.

So, if the speed of string building is important, you'll need to use alternative methods like concatenation with +=. Or, according to Matthieu's answer, don't worry about it because C++11 fixes the inefficiency in a different way.

The technique was well known back when I was learning C++ (around 1990), so I doubt that the reason is that the original designer hadn't heard of it. More likely, he felt it to be poor design for the typical uses expected of `std::string`. — James Kanze, Mar 08 '12 at 15:35
@JamesKanze: Fair enough; my knowledge only goes back to the mid nineties, so I can only speculate about earlier development. — Mike Seymour, Mar 08 '12 at 15:47
@Mike: but `std::string_helper` would have an implicit cast operator to `std::string`, wouldn’t that be sufficient for the code to compile? — qdii, Mar 08 '12 at 15:48
@qdii: No, because the code already requires one implicit user-defined conversion from `string` to `thingy`. The overall conversion can't involve a second one. — Mike Seymour, Mar 08 '12 at 15:51
@MatthieuM: This is more or less the same answer as James. I’ll give the response to the one of you guys who can quote the standard :) — qdii, Mar 08 '12 at 15:57

score 6 · Answer 2 · answered Mar 08 '12 at 15:31

6

The obvious answer: because the standard doesn't allow it. It impacts code by introducing an additional user defined conversion in some cases: if C is a type having a user defined constructor taking an std::string, then it would make:

C obj = stringA + stringB;

illegal.

answered Mar 08 '12 at 15:31

James Kanze

150,581
18
184
329

which user-defined conversion are you referring to? `std::string_helper` would be a class belonging to the standard library in that case. Could you develop? – qdii Mar 08 '12 at 15:44
@qdii: even so, it would be considered a user-defined conversion. The classes of the standard library are regular classes (no magic) as far as the compiler is concerned. – Matthieu M. Mar 08 '12 at 15:47
@qdii: "user-defined" means "not built in to the language"; the standard library counts as a "user". – Mike Seymour Mar 08 '12 at 15:55

Matthieu M. · Answer 3 · 2012-03-08T16:12:02.930

4

It depends.

In C++03, it is exact that there may be a slight inefficiency there (comparable to Java and C# as they use string interning by the way). This can be alleviated using:

d = std::string("") += a += b +=c;

which is not really... idiomatic.

In C++11, operator+ is overloaded for rvalue references. Meaning that:

d = a + b + c;

is transformed into:

d.assign(std::move(operator+(a, b).append(c)));

which is (nearly) as efficient as you can get.

The only inefficiency left in the C++11 version is that the memory is not reserved once and for all at the beginning, so there might be reallocation and copies up to 2 times (for each new string). Still, because appending is amortized O(1), unless C is quite longer than B, then at worst a single reallocation + copy should take place. And of course, we are talking POD copy here (so a memcpy call).

edited Mar 08 '12 at 16:12

answered Mar 08 '12 at 15:21

Matthieu M.

287,565
48
449
722

+1: This is interesting. What do you mean by "appending is amortized O(1)" ? – qdii Mar 08 '12 at 15:59
@qdii: Amortized O(1) is a term used in the complexity analysis of algorithm. It means that it is not always O(1) (since sometimes appending triggers a reallocation + copy of the memory), but *in average* it is O(1). This is done generally by having an exponential growth of the underlying buffer, so that reallocations are less and less often necessary as things grow. For example, doubling the storage each time more storage is required is an adequate strategy. – Matthieu M. Mar 08 '12 at 16:11

Luchian Grigore · Answer 4 · 2012-03-08T15:08:41.930

2

Sounds to me like something like this already exists: std::stringstream.

Only you have << instead of +. Just because std::string::operator + exists, it doesn't make it the most efficient option.

edited Mar 08 '12 at 15:08

answered Mar 08 '12 at 15:02

Luchian Grigore

253,575
64
457
625

score 0 · Answer 5 · answered Mar 08 '12 at 15:04

0

I think if you use +=, then it will be little faster:

d += a;
d += b;
d += c;

It should be faster, as it doesn't create temporary objects.Or simply this,

d.append(a).append(b).append(c); //same as above: i.e using '+=' 3 times.

answered Mar 08 '12 at 15:04

Nawaz

353,942
115
666
851

@MooingDuck: What exactly is no faster? – Nawaz Mar 08 '12 at 15:46
Any of the code in your post should be one memcpy of 12 bytes less than the code in the OP. – Mooing Duck Mar 08 '12 at 15:48

Cheers and hth. - Alf · Answer 6 · 2012-03-08T15:21:57.877

The main reason for not doing a string of individual + concatenations, and especially not doing that in a loop, is that is has O(n²) complexity.

A reasonable alternative with O(n) complexity is to use a simple string builder, like

template< class Char >
class ConversionToString
{
public:
    // Visual C++ 10.0 has some DLL linking problem with other types:
    CPP_STATIC_ASSERT((
        std::is_same< Char, char >::value || std::is_same< Char, wchar_t >::value
        ));

    typedef std::basic_string< Char >           String;
    typedef std::basic_ostringstream< Char >    OutStringStream;

    // Just a default implementation, not particularly efficient.
    template< class Type >
    static String from( Type const& v )
    {
        OutStringStream stream;
        stream << v;
        return stream.str();
    }

    static String const& from( String const& s )
    {
        return s;
    }
};


template< class Char, class RawChar = Char >
class StringBuilder;


template< class Char, class RawChar >
class StringBuilder
{
private:
    typedef std::basic_string< Char >       String;
    typedef std::basic_string< RawChar >    RawString;
    RawString   s_;

    template< class Type >
    static RawString fastStringFrom( Type const& v )
    {
        return ConversionToString< RawChar >::from( v );
    }

    static RawChar const* fastStringFrom( RawChar const* s )
    {
        assert( s != 0 );
        return s;
    }

    static RawChar const* fastStringFrom( Char const* s )
    {
        assert( s != 0 );
        CPP_STATIC_ASSERT( sizeof( RawChar ) == sizeof( Char ) );
        return reinterpret_cast< RawChar const* >( s );
    }

public:
    enum ToString { toString };
    enum ToPointer { toPointer };

    String const&   str() const             { return reinterpret_cast< String const& >( s_ ); }
    operator String const& () const         { return str(); }
    String const& operator<<( ToString )    { return str(); }

    RawChar const*     ptr() const          { return s_.c_str(); }
    operator RawChar const* () const        { return ptr(); }
    RawChar const* operator<<( ToPointer )  { return ptr(); }

    template< class Type >
    StringBuilder& operator<<( Type const& v )
    {
        s_ += fastStringFrom( v );
        return *this;
    }
};

template< class Char >
class StringBuilder< Char, Char >
{
private:
    typedef std::basic_string< Char >   String;
    String  s_;

    template< class Type >
    static String fastStringFrom( Type const& v )
    {
        return ConversionToString< Char >::from( v );
    }

    static Char const* fastStringFrom( Char const* s )
    {
        assert( s != 0 );
        return s;
    }

public:
    enum ToString { toString };
    enum ToPointer { toPointer };

    String const&   str() const             { return s_; }
    operator String const& () const         { return str(); }
    String const& operator<<( ToString )    { return str(); }

    Char const*     ptr() const             { return s_.c_str(); }
    operator Char const* () const           { return ptr(); }
    Char const* operator<<( ToPointer )     { return ptr(); }

    template< class Type >
    StringBuilder& operator<<( Type const& v )
    {
        s_ += fastStringFrom( v );
        return *this;
    }
};

namespace narrow {
    typedef StringBuilder<char>     S;
}  // namespace narrow

namespace wide {
    typedef StringBuilder<wchar_t>  S;
}  // namespace wide

Then you can write efficient and clear things like …

using narrow::S;

std::string a = S() << "The answer is " << 6*7;
foo( S() << "Hi, " << username << "!" );

std::string and multiple concatenations

6 Answers6

Linked