13

I've discovered that std::strings are very slow compared to old-fashioned null-terminated strings, so much slow that they significantly slow down my overall program by a factor of 2.

I expected STL to be slower, I didn't realise it was going to be this much slower.

I'm using Visual Studio 2008, release mode. It shows assignment of a string to be 100-1000 times slower than char* assignment (it's very difficult to test the run-time of a char* assignment). I know it's not a fair comparison, a pointer assignment versus string copy, but my program has lots of string assignments and I'm not sure I could use the "const reference" trick in all places. With a reference counting implementation my program would have been fine, but these implementations don't seem to exist anymore.

My real question is: why don't people use reference counting implementations anymore, and does this mean we all need to be much more careful about avoiding common performance pitfalls of std::string?

My full code is below.

#include <string>
#include <iostream>
#include <time.h>

using std::cout;

void stop()
{
}

int main(int argc, char* argv[])
{
    #define LIMIT 100000000
    clock_t start;
    std::string foo1 = "Hello there buddy";
    std::string foo2 = "Hello there buddy, yeah you too";
    std::string f;
    start = clock();
    for (int i=0; i < LIMIT; i++) {
        stop();
        f = foo1;
        foo1 = foo2;
        foo2 = f;
    }
    double stl = double(clock() - start) / CLOCKS\_PER\_SEC;

    start = clock();
    for (int i=0; i < LIMIT; i++) {
        stop();
    }
    double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC;

    char* goo1 = "Hello there buddy";
    char* goo2 = "Hello there buddy, yeah you too";
    char *g;
    start = clock();
    for (int i=0; i < LIMIT; i++) {
        stop();
        g = goo1;
        goo1 = goo2;
        goo2 = g;
    }
    double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
    cout << "Empty loop = " << emptyLoop << "\n";
    cout << "char* loop = " << charLoop << "\n";
    cout << "std::string = " << stl << "\n";
    cout << "slowdown = " << (stl - emptyLoop) / (charLoop - emptyLoop) << "\n";
    std::string wait;
    std::cin >> wait;
    return 0;
}
Abhinav Gauniyal
  • 7,034
  • 7
  • 50
  • 93
Tim Cooper
  • 10,023
  • 5
  • 61
  • 77
  • Could you show us some simple examples, when STL is soooo slooooow? – Tomek Szpakowicz Mar 12 '09 at 08:14
  • 9
    If char* pointer copies work for you (i.e. if deep copies aren't necessary), then so will std::string* pointer copies. So use them. No-one said you're not allowed to mix pointers and std::strings. Just as with char*, you need to make sure the pointed-to objects stay alive while you work with them. – j_random_hacker Mar 12 '09 at 10:06
  • 2
    @Tim Cooper, you are not actually making copies of those C-strings. what you copy are the handles (the pointers), but not the data they point at. it is equivalent to using swap on std::string. – Johannes Schaub - litb Mar 12 '09 at 10:41
  • 2
    Tim Cooper. try to measure the size in a loop like that. strlen vs str.size() and i bet you will see how std::string is >3x faster at least :) – Johannes Schaub - litb Mar 12 '09 at 10:43
  • 1
    In your above code. Use std:swap(foo1,foo2). So how fast it gets now. – Martin York Mar 12 '09 at 13:57
  • 1
    Yes std::string will be slower than a C string. But a C string provides no guarantees. So you are paying for safety. If you are getting a 100% slow down then you are doing something wrong. I would expects less than 10% but without understanding your use case it is hard to give an exact number. – Martin York Mar 12 '09 at 14:01
  • 1
    Depends what are you measuring, doing strlen(null_terminated_str) needs to iterator over the string each time, std::string size() just returns the member variable - so it's much faster. You would call proper swap() method on strings (implemented with pImpl pattern?) instead of copying strings. I believe std is good, it depends how you use it and it is still easy configure in a way that it works for you. – stefanB Jun 02 '09 at 00:35
  • 1
    This test is so wrong though. It's basically comparing shallow copies (copying pointers) to deep copies (copying pointee data). A significantly more comparable test would compare string copy to strcpy with variable-length C-strings allocated on the heap in which case I can just go ahead and tell you that there's only a trivial difference (1%). You can make a pointer to string too, you know? string* ptr = &some_string; You can also use shared_ptr if you want reference-counting, though reference-counting is also, even conceptually, going to cost more than a simple shallow copy of a pointer. – stinky472 Feb 27 '12 at 18:18
  • @stinky472: My point is precisely that C-style strings let you do shallow copy everywhere but C++ style strings do deep copy everywhere. Using a pointer to a string destroys all the benefits that std::string has over char*. – Tim Cooper Feb 29 '12 at 01:36
  • 4
    @TimCooper That is a completely wrong way to think about it. Using a pointer or reference or smart pointer to std::string or any other C++ object is perfectly fine as with any other object (raw pointers are perfectly fine as long as they're not owning memory). You can also swap the contents of two strings cheaply (string::swap) which is a shallow swap of pointers. If you use any C++ object, the default behavior is to *copy*. That's not a problem with classes, it's because C++ objects, by default, model copyable value types, not reference types. If you want a reference or pointer to ... – stinky472 Feb 29 '12 at 16:09
  • @TimCooper ... an object, simply make one. As to the main point of using a string over a C-string, it's not always better, but the reasons you'd generally prefer std::string if you know what you are doing is: 1. variable-size 2. stored size which avoids redundant computations to find null terminators 3. for the algorithms provided and STL-compliance for more algorithms 4. sequence model 5. RAII, etc. If you're finding the default deep copying behavior of strings to be bad and you think there's no point in making pointers, smart pointers, or references or using swap methods, then you might... – stinky472 Feb 29 '12 at 16:12
  • 3
    @TimCooper as well avoid using C++ all together and just stick to C, since otherwise you could make the same arguments about any object: vectors, lists, strings, QT widgets, and if you think that because the default behavior of such things is to deep copy means you can *only* deep copy, then you'll never write very efficient C++ code. It's fundamental to a C++ developer writing performance-critical code to understand where deep copies are made and how to avoid unnecessary ones and to be able to distinguish between what uses stack vs. heap. – stinky472 Feb 29 '12 at 16:15
  • This microbenchmark is horrible. gcc -O3 compiles the `char*` loop away to nothing, because the loop doesn't do anything. It shows the speedup as `inf`. I'm disappointed that it doesn't compile away the `std::string` loop, too. If anything, this just demonstrates that compilers are bad at optimizing away / hoisting string copies, compared to copies of integer/pointer types. – Peter Cordes Feb 24 '16 at 13:05

14 Answers14

39

Well there are definitely known problems regarding the performance of strings and other containers. Most of them have to do with temporaries and unnecessary copies.

It's not too hard to use it right, but it's also quite easy to Do It Wrong. For example, if you see your code accepting strings by value where you don't need a modifiable parameter, you Do It Wrong:

// you do it wrong
void setMember(string a) {
    this->a = a; // better: swap(this->a, a);
}

You better had taken that by const reference or done a swap operation inside, instead of yet another copy. Performance penalty increases for a vector or list in that case. However, you are right definitely that there are known problems. For example in this:

// let's add a Foo into the vector
v.push_back(Foo(a, b));

We are creating one temporary Foo just to add a new Foo into our vector. In a manual solution, that might create the Foo directly into the vector. And if the vector reaches its capacity limit, it has to reallocate a larger memory buffer for its elements. What does it do? It copies each element separately to their new place using their copy constructor. A manual solution might behave more intelligent if it knows the type of the elements before-hand.

Another common problem is introduced temporaries. Have a look at this

string a = b + c + e;

There are loads of temporaries created, which you might avoid in a custom solution that you actually optimize onto performance. Back then, the interface of std::string was designed to be copy-on-write friendly. However, with threads becoming more popular, transparent copy on write strings have problems keeping their state consistent. Recent implementations tend to avoid copy on write strings and instead apply other tricks where appropriate.

Most of those problems are solved however for the next version of the Standard. For example instead of push_back, you can use emplace_back to directly create a Foo into your vector

v.emplace_back(a, b);

And instead of creating copies in a concatenation above, std::string will recognize when it concatenates temporaries and optimize for those cases. Reallocation will also avoid making copies, but will move elements where appropriate to their new places.

For an excellent read, consider Move Constructors by Andrei Alexandrescu.

Sometimes, however, comparisons also tend to be unfair. Standard containers have to support the features they have to support. For example if your container does not keep map element references valid while adding/removing elements from your map, then comparing your "faster" map to the standard map can become unfair, because the standard map has to ensure that elements keep being valid. That was just an example, of course, and there are many such cases that you have to keep in mind when stating "my container is faster than standard ones!!!".

Lii
  • 11,553
  • 8
  • 64
  • 88
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • My apologies - I significantly edited the question after posting. I wanted to discuss std::vector<> in a separate question. Anyway, I changed my test according to your suggestion, and got a 50 times slowdown. Anytime you reach the memcpy() in an implementation, you're in trouble, I reckon. – Tim Cooper Mar 12 '09 at 09:37
  • class C { std::string foo; public: void set(const std::string& _foo) { foo = _foo; } }; To completely avoid the memcpy(), I'd have to declare foo as a pointer, isn't that right? Which means I need to worry about memory allocation, just like with char*'s? – Tim Cooper Mar 12 '09 at 09:42
  • in that particular case taking the string by value and then using foo.swap(_foo); may be better. i meant to generally talk about parameter passing if you just want to pass some parameter in but don't need to take a copy for modifications. – Johannes Schaub - litb Mar 12 '09 at 10:31
  • also as far as i know msvc uses a small string optimization: for small strings, it keeps data in a buffer statically allocated as a member array, instead of using the heap. also try to use things like .reserve as much as possible. – Johannes Schaub - litb Mar 12 '09 at 10:35
  • you may like the video over here: http://video.google.com/videoplay?docid=-562129216565760352 it's alexandrescu talking about his string class using policies for COW/noCOW and small-string-optimization. interesting watch – Johannes Schaub - litb Mar 12 '09 at 10:36
11

It looks like you're misusing char* in the code you pasted. If you have

std::string a = "this is a";
std::string b = "this is b"
a = b;

you're performing a string copy operation. If you do the same with char*, you're performing a pointer copy operation.

The std::string assignment operation allocates enough memory to hold the contents of b in a, then copies each character one by one. In the case of char*, it does not do any memory allocation or copy the individual characters one by one, it just says "a now points to the same memory that b is pointing to."

My guess is that this is why std::string is slower, because it's actually copying the string, which appears to be what you want. To do a copy operation on a char* you'd need to use the strcpy() function to copy into a buffer that's already appropriately sized. Then you'll have an accurate comparison. But for the purposes of your program you should almost definitely use std::string instead.

Dan Olson
  • 22,849
  • 4
  • 42
  • 56
  • 2
    Everything you say is correct, but the point of the question is "to use std::string's efficiently you often need to work with pointers/references to them rather than the values themselves, in which case you're having to worry about the lifetimes of the values, and yet being free from those concerns is usually touted as the main advantage of std::string." – Tim Cooper Mar 01 '12 at 04:04
7

When writing C++ code using any utility class (whether STL or your own) instead of eg. good old C null terminated strings, you need to rememeber a few things.

  • If you benchmark without compiler optimisations on (esp. function inlining), classes will lose. They are not built-ins, even stl. They are implemented in terms of method calls.

  • Do not create unnesessary objects.

  • Do not copy objects if possible.

  • Pass objects as references, not copies, if possible,

  • Use more specialised method and functions and higher level algorithms. Eg.:

    std::string a = "String a"
    std::string b = "String b"
    
    // Use
    a.swap(b);
    
    // Instead of
    std::string tmp = a;
    a = b;
    b = tmp;
    

And a final note. When your C-like C++ code starts to get more complex, you need to implement more advanced data structures like automatically expanding arrays, dictionaries, efficient priority queues. And suddenly you realise that its a lot of work and your classes are not really faster then stl ones. Just more buggy.

Tomek Szpakowicz
  • 14,063
  • 3
  • 33
  • 55
5

You are most certainly doing something wrong, or at least not comparing "fairly" between STL and your own code. Of course, it's hard to be more specific without code to look at.

It could be that you're structuring your code using STL in a way that causes more constructors to run, or not re-using allocated objects in a way that matches what you do when you implement the operations yourself, and so on.

unwind
  • 391,730
  • 64
  • 469
  • 606
5

This test is testing two fundamentally different things: a shallow copy vs. a deep copy. It's essential to understand the difference and how to avoid deep copies in C++ since a C++ object, by default, provides value semantics for its instances (as with the case with plain old data types) which means that assigning one to the other is generally going to copy.

I "corrected" your test and got this:

char* loop = 19.921
string = 0.375
slowdown = 0.0188244

Apparently we should cease using C-style strings since they are soooo much slower! In actuality, I deliberately made my test as flawed as yours by testing shallow copying on the string side vs. strcpy on the :

#include <string>
#include <iostream>
#include <ctime>

using namespace std;

#define LIMIT 100000000

char* make_string(const char* src)
{
    return strcpy((char*)malloc(strlen(src)+1), src);
}

int main(int argc, char* argv[])
{
    clock_t start;
    string foo1 = "Hello there buddy";
    string foo2 = "Hello there buddy, yeah you too";
    start = clock();
    for (int i=0; i < LIMIT; i++)
        foo1.swap(foo2);
    double stl = double(clock() - start) / CLOCKS_PER_SEC;

    char* goo1 = make_string("Hello there buddy");
    char* goo2 = make_string("Hello there buddy, yeah you too");
    char *g;
    start = clock();
    for (int i=0; i < LIMIT; i++) {
        g = make_string(goo1);
        free(goo1);
        goo1 = make_string(goo2);
        free(goo2);
        goo2 = g;
    }
    double charLoop = double(clock() - start) / CLOCKS_PER_SEC;
    cout << "char* loop = " << charLoop << "\n";
    cout << "string = " << stl << "\n";
    cout << "slowdown = " << stl / charLoop << "\n";
    string wait;
    cin >> wait;
}

The main point is, and this actually gets to the heart of your ultimate question, you have to know what you are doing with the code. If you use a C++ object, you have to know that assigning one to the other is going to make a copy of that object (unless assignment is disabled, in which case you'll get an error). You also have to know when it's appropriate to use a reference, pointer, or smart pointer to an object, and with C++11, you should also understand the difference between move and copy semantics.

My real question is: why don't people use reference counting implementations anymore, and does this mean we all need to be much more careful about avoiding common performance pitfalls of std::string?

People do use reference-counting implementations. Here's an example of one:

shared_ptr<string> ref_counted = make_shared<string>("test");
shared_ptr<string> shallow_copy = ref_counted; // no deep copies, just 
                                               // increase ref count

The difference is that string doesn't do it internally as that would be inefficient for those who don't need it. Things like copy-on-write are generally not done for strings either anymore for similar reasons (plus the fact that it would generally make thread safety an issue). Yet we have all the building blocks right here to do copy-on-write if we wish to do so: we have the ability to swap strings without any deep copying, we have the ability to make pointers, references, or smart pointers to them.

To use C++ effectively, you have to get used to this way of thinking involving value semantics. If you don't, you might enjoy the added safety and convenience but do it at heavy cost to the efficiency of your code (unnecessary copies are certainly a significant part of what makes poorly written C++ code slower than C). After all, your original test is still dealing with pointers to strings, not char[] arrays. If you were using character arrays and not pointers to them, you'd likewise need to strcpy to swap them. With strings you even have a built-in swap method to do exactly what you are doing in your test efficiently, so my advice is to spend a bit more time learning C++.

stinky472
  • 6,737
  • 28
  • 27
  • My whole point was that C-style strings encourage you to use pointers and do shallow copy, whereas with C++ strings you need to do complicated stuff to use them as references, and if you're working with pointers/references to C++ strings then you need to worry about what's the lifetime of the actual string, in which case one wonders what's the improvement on char*'s. – Tim Cooper Mar 01 '12 at 04:01
  • Also, when I referred to "reference counting" implementations in the question, I was referring to the fact that the early implementations of std::string were reference counting implementations: very efficient, but prone to bugs in multi-threaded programs. If the shared_ptr<> was both efficient and without the same concurrency problems, then it would be the built-in implementation of std::string without the need for that "shared_ptr" stuff. – Tim Cooper Mar 01 '12 at 04:19
  • 2
    @TimCooper But how are references and pointers complicated, when you are already using pointers in your C-style strings? If you have a struct Foo and you don't want to deep copy it, you either make a reference to it or pointer to it. The knowledge of when the compiler makes copies is equally prevalent in C, we just don't have as many facilities to build types as complex as std::string. – stinky472 Mar 01 '12 at 23:01
  • 2
    @TimCooper reference-counted and copy-on-write strings are only efficient for naive uses of them where the user doesn't understand how to avoid deep copies. But at the same time it makes those of us who use std::string where copies are actually needed and intended *slower*. It adds overhead to those of us who don't want those features. It basically adds overhead to str1 = str2 when a copy is actually intended by trying to do fancy things to avoid copying unless the string is modified. With C++ in the hands of a knowledgeable developer, he would avoid such copies anyway. – stinky472 Mar 01 '12 at 23:02
  • 1
    @TimCooper I have a co-worker who actually views things the same way. He thinks classes like std::vector should provide atomic push_backs for thread safety. That would ruin the efficiency of std::vector for cases where concurrent push_backs aren't needed. The kind of philosophy the C++ designers are going for nowadays is to make you avoid paying for things you don't need. – stinky472 Mar 01 '12 at 23:04
  • In the old days I was able to assign a string using "char* a = b;". Efficient, thread safe but I had to worry about the lifetime of the value. Then it became: std::string a = b; but this was not thread-safe, and so many multi-threaded programs broke that the compiler vendors replaced them with deep copy implementations. Now people like you are saying that best practice is to use: "const std::string& a = b;" AND still worry about 'delete's, or "shared_ptr a = make_shared(b); or a variety of techniques each suited to different situations? char*a=b seems simpler. – Tim Cooper Mar 02 '12 at 00:47
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/8421/discussion-between-tim-cooper-and-stinky472) – Tim Cooper Mar 02 '12 at 00:47
  • Why is: const std::string& a=b; better than char* a=b; if you have to worry about the value's lifetime in both cases? – Tim Cooper Mar 02 '12 at 00:53
2

If you have an indication of the eventual size of your vector you can prevent excessive resizes by calling reserve() before filling it up.

Daniel Sloof
  • 12,568
  • 14
  • 72
  • 106
  • My apologies, I significantly edited the question after posting to focus on std::string's. But why doesn't STL automatically reserve extra space if it sees you creating a vector using push_back()? – Tim Cooper Mar 12 '09 at 09:43
  • 3
    @Tim: Most implementations of vector do -- when space runs out, they reallocate to twice the original size. This way, adding n elements leads to at most log(n) reallocations, and you never waste more than 50% memory. – j_random_hacker Mar 12 '09 at 10:00
2

The main rules of optimization:

  • Rule 1: Don't do it.
  • Rule 2: (For experts only) Don't do it yet.

Are you sure that you have proven that it is really the STL that is slow, and not your algorithm?

Tobias Hertkorn
  • 2,742
  • 1
  • 21
  • 28
2

Good performance isn't always easy with STL, but generally, it is designed to give you the power. I found Scott Meyers' "Effective STL" an eye-opener for understanding how to deal with the STL efficiently. Read!

As others said, you are probably running into frequent deep copies of the string, and compare that to a pointer assignment / reference counting implementation.

Generally, any class designed towards your specific needs, will beat a generic class that's designed for the general case. But learn to use the generic class well, and learn to ride the 80:20 rules, and you will be much more efficient than someone rolling everything on their own.


One specific drawback of std::string is that it doesn't give performance guarantees, which makes sense. As Tim Cooper mentioned, STL does not say whether a string assignment creates a deep copy. That's good for a generic class, because reference counting can become a real killer in highly concurrent applications, even though it's usually the best way for a single threaded app.

peterchen
  • 40,917
  • 20
  • 104
  • 186
0

They didn't go wrong. STL implementation is generally speaking better than yours.

I'm sure that you can write something better for a very particular case, but a factor of 2 is too much... you really must be doing something wrong.

0

If used correctly, std::string is as efficient as char*, but with the added protection.

If you are experiencing performance problems with the STL, it's likely that you are doing something wrong.

Additionally, STL implementations are not standard across compilers. I know that SGI's STL and STLPort perform generally well.

That said, and I am being completely serious, you could be a C++ genius and have devised code that is far more sophisticated than the STL. It's not likely , but who knows, you could be the LeBron James of C++.

Alan
  • 45,915
  • 17
  • 113
  • 134
0

I would say that STL implementations are better than the traditional implementations. Also did you try using a list instead of a vector, because vector is efficient for some purpose and list is efficient for some other

Prabhu R
  • 13,836
  • 21
  • 78
  • 112
  • I don't believe I have ever had a situation in my own code where `std::list` was actually the best container. – David Stone May 10 '12 at 03:45
  • @David: Depends on the use case. If you're commonly inserting stuff into the middle of your collection, and don't need random access, a list would probably be a lot faster. – cHao Jul 09 '12 at 18:54
  • 1
    @cHao I don't believe that is generally true, and I just recently wrote up a more complete treatment of the subject: http://stackoverflow.com/questions/8742462/stdforward-list-and-stdforward-listpush-back/11375990#11375990 . Executive summary: the cost of searching through the list to find where in the middle you want to insert dominates in a `std::list` to the point where it's still cheaper to insert into the middle of a `std::vector` and shift everything over. In modern computers, memory locality rules. – David Stone Jul 10 '12 at 02:08
-1

std::string will always be slower than C-strings. C-strings are simply a linear array of memory. You cannot get any more efficient than that, simply as a data structure. The algorithms you use (like strcat() or strcpy()) are generally equivalent to the STL counterparts. The class instantiation and method calls will be, in relative terms, significantly slower than C-string operations (even worse if the implementation uses virtuals). The only way you could get equivalent performance is if the compiler does optimization.

Legion
  • 107
  • 1
  • 3
    Which it does, unless you tell it not to. Optimization is the default in C++, because so much goes on behind the scenes that to do it all when you don't need to can make stuff unbearably slow. With a decent compiler, though, C++ can be just as fast as C, if not faster. (Consider that string copies can easily be inlined in C++, and since you already have the length (unlike in C), the actual copy becomes a `rep movsb` (which is pretty much as fast as you're gonna get). – cHao Jul 09 '12 at 18:47
-1
                        string  const string&   char*   Java string
---------------------------------------------------------------------------------------------------
Efficient               no **       yes         yes     yes
assignment                          

Thread-safe             yes         yes         yes     yes

memory management       yes         no          no      yes
done for you

** There are 2 implementations of std::string: reference counting or deep-copy. Reference counting introduces performance problems in multi-threaded programs, EVEN for just reading strings, and deep-copy is obviously slower as shown above. See: Why VC++ Strings are not reference counted?

As this table shows, 'string' is better than 'char*' in some ways and worse in others, and 'const string&' is similar in properties to 'char*'. Personally I'm going to continue using 'char*' in many places. The enormous amount of copying of std::string's that happens silently, with implicit copy constructors and temporaries makes me somewhat ambivalent about std::string.

Community
  • 1
  • 1
Tim Cooper
  • 10,023
  • 5
  • 61
  • 77
  • I prefer const& whenever possible, pretty much for reasons like this. I would argue that memory management /is/ done for you, since you can have const& without ever having a pointer. – Jayen Apr 13 '13 at 09:26
  • Any time you use 'const&' you need to worry about the owner of the string going out of scope, at which point you have a dangling reference. So I stand by the statement that memory management is _not_ done for you. – Tim Cooper Apr 14 '13 at 10:45
-5

A large part of the reason might be the fact that reference-counting is no longer used in modern implementations of STL.

Here's the story (someone correct me if I'm wrong): in the beginning, STL implementations used reference counting, and were fast but not thread-safe - the implementors expected application programmers to insert their own locking mechanisms at higher levels, to make them thread-safe, because if locking was done at 2 levels then this would slow things down twice as much.

However, the programmers of the world were too ignorant or lazy to insert locks everywhere. For example, if a worker thread in a multi-threaded program needed to read a std::string commandline parameter, then a lock would be needed even just to read the string, otherwise crashes could ensue. (2 threads increment the reference count simultaneously on different CPU's (+1), but decrement it separately (-2), so the reference count goes down to zero, and the memory is freed.)

So implementors ditched reference counting and instead had each std::string always own its own copy of the string. More programs worked, but they were all slower.

So now, even a humble assignment of one std::string to another, (or equivalently, passing a std::string as a parameter to a function), takes about 400 machine code instructions instead of the 2 it takes to assign a char*, a slowdown of 200 times.

I tested the magnitude of the inefficiency of std::string on one major program, which had an overall slowdown of about 100% compared with null-terminated strings. I also tested raw std::string assignment using the following code, which said that std::string assignment was 100-900 times slower. (I had trouble measuring the speed of char* assignment). I also debugged into the std::string operator=() function - I ended up knee deep in the stack, about 7 layers deep, before hitting the 'memcpy()'.

I'm not sure there's any solution. Perhaps if you need your program to be fast, use plain old C++, and if you're more concerned about your own productivity, you should use Java.

#define LIMIT 800000000
clock_t start;
std::string foo1 = "Hello there buddy";
std::string foo2 = "Hello there buddy, yeah you too";
std::string f;

start = clock();
for (int i=0; i < LIMIT; i++) {
    stop();
    f    = foo1;
    foo1 = foo2;
    foo2 = f;
}
double stl = double(clock() - start) / CLOCKS_PER_SEC;

start = clock();
for (int i=0; i < LIMIT; i++) {
    stop();
}
double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC;

char* goo1 = "Hello there buddy";
char* goo2 = "Hello there buddy, yeah you too";
char *g;

start = clock();
for (int i=0; i < LIMIT; i++) {
    stop();
    g = goo1;
    goo1 = goo2;
    goo2 = g;
}
double charLoop = double(clock() - start) / CLOCKS_PER_SEC;

TfcMessage("done", 'i', "Empty loop = %1.3f s\n"
                        "char* loop = %1.3f s\n"
                        "std::string loop = %1.3f s\n\n"
                        "slowdown = %f", 
                        emptyLoop, charLoop, stl, 
                        (stl - emptyLoop) / (charLoop - emptyLoop));
Luc Touraille
  • 79,925
  • 15
  • 92
  • 137
Tim Cooper
  • 10,023
  • 5
  • 61
  • 77
  • You are comparing apples with oranges. In the second case you are just changing some pointers. The equivalent to the first code will be to call strcpy(). Also, I believe you didn't copy-pasted all the code –  Mar 12 '09 at 08:33
  • 3
    Wrong, seriously wrong. Locking objects against concurrent access is still needed today. But with reference counting, you had to lock *all* strings that could possibly be a copy of the one you're working with. Furthermore, the performace of COW was measured to be slower, see google for details. – MSalters Mar 12 '09 at 08:52
  • MSalters, my apologies if I implied somewhere that removing reference counting from std::string removed the need to lock any objects against concurrent access. – Tim Cooper Mar 12 '09 at 10:20
  • Ionut Anghelcovici: I realise that char* involves copying only pointers. I guess my point is that I would rather do that than copy the full string. – Tim Cooper Mar 12 '09 at 12:10
  • If you would rather copy pointers, then just copy pointers to std::string. Otherwise it is simply not a meaningful comparison. – Vagrant Jun 18 '10 at 06:36
  • If I am going to use pointers to std::string then I'm not benefiting from std::string - I might as well use 'char*' - because I'd have all the same headaches about memory management. Wouldn't you agree? – Tim Cooper Jun 19 '10 at 13:32
  • @TimCooper `std::string` takes care of the automatic resizing of the string, so even if you did something like `new std::string`, it would be simpler to use. However, with functions, you can do `void f(std::string const & str)` to avoid a copy if you do not want copy semantics. – David Stone Jul 10 '12 at 02:14