Why do std::string operations perform poorly?

Question

I made a test to compare string operations in several languages for choosing a language for the server-side application. The results seemed normal until I finally tried C++, which surprised me a lot. So I wonder if I had missed any optimization and come here for help.

The test are mainly intensive string operations, including concatenate and searching. The test is performed on Ubuntu 11.10 amd64, with GCC's version 4.6.1. The machine is Dell Optiplex 960, with 4G RAM, and Quad-core CPU.

in Python (2.7.2):

def test():
    x = ""
    limit = 102 * 1024
    while len(x) < limit:
        x += "X"
        if x.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0:
            print("Oh my god, this is impossible!")
    print("x's length is : %d" % len(x))

test()

which gives result:

x's length is : 104448

real    0m8.799s
user    0m8.769s
sys     0m0.008s

in Java (OpenJDK-7):

public class test {
    public static void main(String[] args) {
        int x = 0;
        int limit = 102 * 1024;
        String s="";
        for (; s.length() < limit;) {
            s += "X";
            if (s.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ") > 0)
            System.out.printf("Find!\n");
        }
        System.out.printf("x's length = %d\n", s.length());
    }
}

which gives result:

x's length = 104448

real    0m50.436s
user    0m50.431s
sys     0m0.488s

in Javascript (Nodejs 0.6.3)

function test()
{
    var x = "";
    var limit = 102 * 1024;
    while (x.length < limit) {
        x += "X";
        if (x.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0)
            console.log("OK");
    }
    console.log("x's length = " + x.length);
}();

which gives result:

x's length = 104448

real    0m3.115s
user    0m3.084s
sys     0m0.048s

in C++ (g++ -Ofast)

It's not surprising that Nodejs performas better than Python or Java. But I expected libstdc++ would give much better performance than Nodejs, whose result really suprised me.

#include <iostream>
#include <string>
using namespace std;
void test()
{
    int x = 0;
    int limit = 102 * 1024;
    string s("");
    for (; s.size() < limit;) {
        s += "X";
        if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != string::npos)
            cout << "Find!" << endl;
    }
    cout << "x's length = " << s.size() << endl;
}

int main()
{
    test();
}

which gives result:

x length = 104448

real    0m5.905s
user    0m5.900s
sys     0m0.000s

Brief Summary

OK, now let's see the summary:

javascript on Nodejs(V8): 3.1s
Python on CPython 2.7.2 : 8.8s
C++ with libstdc++: 5.9s
Java on OpenJDK 7: 50.4s

Surprisingly! I tried "-O2, -O3" in C++ but noting helped. C++ seems about only 50% performance of javascript in V8, and even poor than CPython. Could anyone explain to me if I had missed some optimization in GCC or is this just the case? Thank you a lot.

You are testing a mixture of operations, you should probably try to divide the test into different tests that perform different checks for performance, for example: growing strings, or finding, or ... currently you cannot know where the time is being spent. And BTW, this is probably a quite useless test to decide on a language... — David Rodríguez - dribeas, Nov 29 '11 at 11:38
@AshBurlaczenko possibly because strings in Java are immutable. I suppose `s += "X"` is a performance killer there. That's the reason `StringBuilder` exists. — R. Martinho Fernandes, Nov 29 '11 at 11:39
Yeah, it's really 50s, and I tried several times. I supposed Java would perform better than Python at least, though. — Wu Shu, Nov 29 '11 at 11:39
Get rid of the console output from your testing, it will be skewing your results — Andrew Walker, Nov 29 '11 at 11:41
@AshBurlaczenko: In java strings are immutable and pooled, therefore dead slow. Normally you use stringbuilders to assemble strings. This whole thing here is comparing apples and oranges anyways. — PlasmaHH, Nov 29 '11 at 11:41
You're also including the runtime startup and termination of each language in your results. — R. Martinho Fernandes, Nov 29 '11 at 11:46
@AndrewWalker I don't think the console output is supposed to occur except for a single line at the end... — Karl Knechtel, Nov 29 '11 at 11:54
Maybe try a different compiler as well. I compiled the sample using Visual Studio 2010. Runtime 1.79 Seconds on a XEON 2.8 GHz. — RED SOFT ADAIR, Nov 29 '11 at 14:22
@David: It would be meaningless if I test each type of operation separately, for there won't be winners on all operations. This test is to emulate a server daemon that listens on a socket or waits on the terminal for character stream, and analysis it in real time. But for simplicity, the test program omits a lot of not-so-relevant operations. — Wu Shu, Nov 29 '11 at 15:09
@WuShu: Do you intend on loading a character at a time? If so, then this test is representative of your usage, and the best thing you can do it not deciding a language, but looking for a better approach. Dividing the test will provide with more information, as it will let you know for each language what operations are most expensive and will thus let you optimize your processing for that and still the test will be absurd. Just in C++, changing from `s+="X"` to `s.push_back('X')` will probably have a greater impact than the differences with other languages... — David Rodríguez - dribeas, Nov 29 '11 at 15:32
Erm, how did C++ perform more poorly than CPython? (You show 5.5s with C++, 8.8 seconds with CPython) — Billy ONeal, Nov 29 '11 at 17:37
Not as relevant to the C++ part of the Question, but Matt Joiner is correct in his assessment of the Java solution. Use a StringBuffer to concatenate instead of a String. Java Strings are not mutable, and it will create a new String every time you += to it. From http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/StringBuffer.html, "A string buffer is like a String, but can be modified. " — Ogre Psalm33, Nov 29 '11 at 18:31
@REDSOFTADAIR I confirm that a different compiler can help. On my machine compiling with (visual studio 2010) cl /EHsc -O2 gives 1.753 s (close to your result) while python 2.7.1 takes 6.266 s (also more or less in line with the OP results) — Francesco, Nov 29 '11 at 18:52
You should ask someone who knows the language to implement your test case. Simply using StringBuilder in Java cuts the running time from 75 seconds to 4 on my machine. — jackrabbit, Nov 29 '11 at 21:12
@David: I tried s.push_back('X') but only improved about 0.1s. As of others' and my option, the bottle neck is mainly the searching procedure. And I also tested string::find is quicker than strstr or memmem if "-O" disabled, but much slower if "-O2" enabled. So string::find maybe hard to optimized. — Wu Shu, Nov 30 '11 at 01:59
@Billy: Well, it's not so fair to directly compare CPython and C++ using 'time' command, for CPython spends more time than C++ to start the engine, that is, loading and compiling into byte-code. So I supposed CPython faster if it C++ didn't exceeds much much more, which of course, not very accurately. — Wu Shu, Nov 30 '11 at 02:04
@AshBurlaczenko: Yeah you are right. I tested with StringBuilder and Java gave the score of 2.8s, the best performance within all the tests, which of course still slower than strstr in pure C (about 1.6s). While strangely Java with StringBuilder still performed about 30% slower than Nodejs on my Thinkpad R400 laptop. — Wu Shu, Nov 30 '11 at 02:10
@Wu: You can make the same argument against Java or Node. (Try writing a minimal hello world python app and subtract the time that takes from the time your benchmark took; that should get rid of startup delay) — Billy ONeal, Nov 30 '11 at 02:39
@PlasmaHH Note that Lua strings are also immutable and pooled (interned), and I found that a Lua version of this test runs over twice as fast as the next fastest performer (nodejs/v8). [about 1.5s for Lua vs. about 3.1s for nodejs] — snogglethorpe, Nov 30 '11 at 09:39
You might also want to try profile guided optimizations -- in C and C++ this can be significant for loops like this (node.js/V8 are certainly compiled with such things on) — Billy ONeal, Dec 01 '11 at 07:52
It appears that GNU libstdc++ inline coded a for loop for std::string find. Dumb move. memmem and strstr are much more optimized with hand-written versions in assembly taking advantage of SSE2 and SSE4 extensions. — Zan Lynx, Feb 18 '14 at 22:26

Matt Joiner · Answer 1 · 2018-03-28T23:42:12.027

It's not that std::string performs poorly (as much as I dislike C++), it's that string handling is so heavily optimized for those other languages.

Your comparisons of string performance are misleading, and presumptuous if they are intended to represent more than just that.

I know for a fact that Python string objects are completely implemented in C, and indeed on Python 2.7, numerous optimizations exist due to the lack of separation between unicode strings and bytes. If you ran this test on Python 3.x you will find it considerably slower.

Javascript has numerous heavily optimized implementations. It's to be expected that string handling is excellent here.

Your Java result may be due to improper string handling, or some other poor case. I expect that a Java expert could step in and fix this test with a few changes.

As for your C++ example, I'd expect performance to slightly exceed the Python version. It does the same operations, with less interpreter overhead. This is reflected in your results. Preceding the test with s.reserve(limit); would remove reallocation overhead.

I'll repeat that you're only testing a single facet of the languages' implementations. The results for this test do not reflect the overall language speed.

I've provided a C version to show how silly such pissing contests can be:

#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>

void test()
{
    int limit = 102 * 1024;
    char s[limit];
    size_t size = 0;
    while (size < limit) {
        s[size++] = 'X';
        if (memmem(s, size, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) {
            fprintf(stderr, "zomg\n");
            return;
        }
    }
    printf("x's length = %zu\n", size);
}

int main()
{
    test();
    return 0;
}

Timing:

matt@stanley:~/Desktop$ time ./smash 
x's length = 104448

real    0m0.681s
user    0m0.680s
sys     0m0.000s

FWIW the difference between Python 2.7 and 3.2 is just under 10%. It is possible that PEP 393 will remove that difference in Python 3.3. Also it might be worth mentioned that searching strings in Python uses a form of Boyer-Moore so when the string gets longer it should gain an advantage over languages that do a plain search. — Duncan, Nov 29 '11 at 12:20
@Matt: Well, the C program is too extreme... I didn't tried to make a battle or contest between languages, for each language has its optimization in different ways. I just want to find a language that can proceed strings with quite good efficiency. The program just described a case that a program reads from an input(console or socket), maybe decrypts it then, and searches through the string for a specified pattern. My test program simplified the procedure and was just a demo, of course. The result just reminds me C++ is not always the sharpest knife. And thanks anyway :) — Wu Shu, Nov 29 '11 at 14:01
@Wu Shu: If the specific pattern to search for is fixed and predetermined, you can construct an automaton to search for the pattern. This will be much faster than repeated calls to `std::string::find`. — swegi, Nov 29 '11 at 15:43
@WuShu: actually, C and C++ are probably the sharpest knives. It's just that Python and Node.js may be chainsaws. It's heavy and sometimes overkill, but when you tire in C++ you appreciate the "batteries included" approach they took in Python. — Matthieu M., Nov 29 '11 at 20:38
In java, using StringBuilder instead of String speeds it up (on my machine) aproximately 4 times, rest is searching. In java, string is immutable, so what he is doing is atrociously wrong string manipulation in java. Then there is issue of timing VM start instead of timing usefulactions (it is issue for all languages on VM, not just java) — Alpedar, Nov 30 '11 at 12:02

score 34 · Answer 2 · answered Nov 29 '11 at 12:14

34

So I went and played a bit with this on ideone.org.

Here a slightly modified version of your original C++ program, but with the appending in the loop eliminated, so it only measures the call to std::string::find(). Note that I had to cut the number of iterations to ~40%, otherwise ideone.org would kill the process.

#include <iostream>
#include <string>

int main()
{
    const std::string::size_type limit = 42 * 1024;
    unsigned int found = 0;

    //std::string s;
    std::string s(limit, 'X');
    for (std::string::size_type i = 0; i < limit; ++i) {
        //s += 'X';
        if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != std::string::npos)
            ++found;
    }

    if(found > 0)
        std::cout << "Found " << found << " times!\n";
    std::cout << "x's length = " << s.size() << '\n';

    return 0;
}

My results at ideone.org are time: 3.37s. (Of course, this is highly questionably, but indulge me for a moment and wait for the other result.)

Now we take this code and swap the commented lines, to test appending, rather than finding. Note that, this time, I had increased the number of iterations tenfold in trying to see any time result at all.

#include <iostream>
#include <string>

int main()
{
    const std::string::size_type limit = 1020 * 1024;
    unsigned int found = 0;

    std::string s;
    //std::string s(limit, 'X');
    for (std::string::size_type i = 0; i < limit; ++i) {
        s += 'X';
        //if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != std::string::npos)
        //    ++found;
    }

    if(found > 0)
        std::cout << "Found " << found << " times!\n";
    std::cout << "x's length = " << s.size() << '\n';

    return 0;
}

My results at ideone.org, despite the tenfold increase in iterations, are time: 0s.

My conclusion: This benchmark is, in C++, highly dominated by the searching operation, the appending of the character in the loop has no influence on the result at all. Was that really your intention?

answered Nov 29 '11 at 12:14

sbi

219,715
46
258
445

5

@sbi: and that is when one notes than in C++ `find` is O(N), while in `Python` `indexOf` uses Boyer-Moore (as noted by Duncan in a comment). Once again, "batteries included". – Matthieu M. Nov 29 '11 at 12:56
7

@Matthieu M.: Boyer-Moore does not gain you anything here, because the first character of the search-for string is not found at all in the search-into string. On the contrary, it may be adding some overhead, needlessly processing the search-for string in every iteration of the loop. – Mike Nakis Nov 29 '11 at 16:06
1

Are we sure that string::find(const char*) isn't just implemented in terms of string::find(const string&)? If it was, memory allocations could be expensive here. – Kylotan Nov 29 '11 at 18:19
@Kylotan: I tested both. No visible difference. – Matthieu M. Nov 29 '11 at 20:34
@MikeNakis: Indeed, I tested it and even doing the loop invariant code motion by hand (to move the pattern analysis out of the loop) the boyer-moore search was still slower. Therefore I suspect they use something even trickier, perhaps closer to `memmem`. – Matthieu M. Nov 29 '11 at 20:35
@MatthieuM. WRT C++ find being linear time, am I missing something? Boyer-Moore is also linear time, isn't it? – Tim Seguine Sep 09 '13 at 15:13
@Tim: linear can be faster than linear :) – Matthieu M. Sep 09 '13 at 16:34
@MatthieuM. of course, but the wording of your comment implied a contrast of some sort. OK, no more language lawyering from me. – Tim Seguine Sep 09 '13 at 16:36
I suspect the reason `s += 'X';` results in `0s` is because the compiler is detecting the fact the loop is just doing `s += 'X';` `limit` times within a loop and instead generating the equivalent of `std::string s(limit, 'X')`, or perhaps even realising that the actual contents of the string aren't relevant and merely replacing the `s.size()` with `limit` - i.e. either the `s += 'X';` or even `s` is entirely optimised away. While I don't doubt the `find` is the bulk of the operation, I expect its inclusion prevents `s += 'X';` and `s` being optimised away. – Pharap May 31 '23 at 05:06

score 16 · Answer 3 · answered Nov 29 '11 at 21:52

16

The idiomatic C++ solution would be:

#include <iostream>
#include <string>
#include <algorithm>

int main()
{
    const int limit = 102 * 1024;
    std::string s;
    s.reserve(limit);

    const std::string pattern("ABCDEFGHIJKLMNOPQRSTUVWXYZ");

    for (int i = 0; i < limit; ++i) {
        s += 'X';
        if (std::search(s.begin(), s.end(), pattern.begin(), pattern.end()) != s.end())
            std::cout << "Omg Wtf found!";
    }
    std::cout << "X's length = " << s.size();
    return 0;
}

I could speed this up considerably by putting the string on the stack, and using memmem -- but there seems to be no need. Running on my machine, this is over 10x the speed of the python solution already..

[On my laptop]

time ./test X's length = 104448 real 0m2.055s user 0m2.049s sys 0m0.001s

answered Nov 29 '11 at 21:52

Heptic

3,076
4
30
51

2

Confirmed. g++ 4.4.3. In my test 5s for search, 12.5s for find (both in the same exe; my test times are longer as I pre-created the string with `std::string s(limit,'X');` I.e. search and find had more work to do.) CONCLUSION: stdlib find() on g++ has lots of potential for optimization! – Darren Cook Dec 01 '11 at 01:15
1

Wow; added a memmem() version, and it is 0.75s (using the same string, accessed via c_str()). (Actually, it was 0; the whole loop seemed to get optimized away; so I added some minor computation to the loop to stop that.) NEW CONCLUSION: find() and search() are doing something weird, that even -O3 cannot optimize, or memmem is using some special CPU feature. Fascinating! – Darren Cook Dec 01 '11 at 01:35
8

The reason std::search is faster than std::string::search is the because (by convention?) std::search is implemented in the header which gives the compiler much more room to optimise. std::string::search on the other hand is not. (And because this is calling the function so many times, it makes a big different) – Heptic Dec 01 '11 at 22:29
5

@Heptic: Um. `std::string` is but a typedef for `std::basic_string`, which is a template, and as such fully implemented in headers. – sbi Dec 01 '12 at 20:43
@sbi Unless `std::basic_string` is specialised, in which case the functions may be implemented in a `.cpp` because they then behave as functions on a 'normal' class rather than a templated class. Also some compilers attempt to 'precompile' headers using various tricks, which may also interfere (though I don't think GCC is one of those as far as I remember) – Pharap May 31 '23 at 05:14

score 10 · Answer 4 · edited Nov 29 '11 at 11:47

10

That is the most obvious one: please try to do s.reserve(limit); before main loop.

Documentation is here.

I should mention that direct usage of standard classes in C++ in the same way you are used to do it in Java or Python will often give you sub-par performance if you are unaware of what is done behind the desk. There is no magical performance in language itself, it just gives you right tools.

edited Nov 29 '11 at 11:47

mac

42,153
26
121
131

answered Nov 29 '11 at 11:43

Mihails Strasuns

3,783
1
18
21

On my machine adding `s.reserve(limit)` before the loop makes no perceptible difference to performance. – NPE Nov 29 '11 at 11:44
I agree with what you are saying *in general*, but have you tested this? With gcc 4.6 I don't get any speedup when using `string::reserve`. Can you show how to do the concatenation in a fast way, exploiting the knowledge of how the classes work in the background? – Szabolcs Nov 29 '11 at 11:48
Is that really an issue here? Each `string::operator++` is only appending a single character, so memory reallocation and copying shouldn't be a big drain. – Nov 29 '11 at 11:48
Yes I tried s.reserve(102 * 1024), but no help. It's about 5.895s and little improvement. I think the bottle neck is in string search. – Wu Shu Nov 29 '11 at 11:49
1

Well, checked this in practice. Replacing s += "X" with string s(102*1024, 'X'); made enormous speed improvement ( real 0m0.003s in my VBox ). std::string::reserve not helping though, despite what I have said ( should have had same effect in my opinion though ). Need to investigate a bit more. Edited: lol, only now have paid attention to the way for loop is stated :) ok, rollback everything – Mihails Strasuns Nov 29 '11 at 12:24
1

Of course building the string made enormous speed improvement. You then bypass the loop completely... You need to change the loop condition to iterate on a `i = 0` variable if you allocate the string first, and then you'll notice that the search is the real issue. – Matthieu M. Nov 29 '11 at 12:58

score 8 · Answer 5 · 2011-11-29T12:06:54.617

8

My first thought is that there isn't a problem.

C++ gives second-best performance, nearly ten times faster than Java. Maybe all but Java are running close to the best performance achievable for that functionality, and you should be looking at how to fix the Java issue (hint - StringBuilder).

In the C++ case, there are some things to try to improve performance a bit. In particular...

s += 'X'; rather than s += "X";
Declare string searchpattern ("ABCDEFGHIJKLMNOPQRSTUVWXYZ"); outside the loop, and pass this for the find calls. An std::string instance knows it's own length, whereas a C string requires a linear-time check to determine that, and this may (or may not) be relevant to std::string::find performance.
Try using std::stringstream, for a similar reason to why you should be using StringBuilder for Java, though most likely the repeated conversions back to string will create more problems.

Overall, the result isn't too surprising though. JavaScript, with a good JIT compiler, may be able to optimise a little better than C++ static compilation is allowed to in this case.

With enough work, you should always be able to optimise C++ better than JavaScript, but there will always be cases where that doesn't just naturally happen and where it may take a fair bit of knowledge and effort to achieve that.

edited Nov 29 '11 at 12:06

answered Nov 29 '11 at 12:00

The performance is bounded by the `find` call, not the allocation. For example, testing the 2nd point, there is just not difference (at all). – Matthieu M. Nov 29 '11 at 13:07
@Matthieu - well, I didn't say any of my ideas would definitely make a difference. However, the second point is **all about** the `find` call. The point is to use a different overload of `find` which takes the search pattern as an `std::string` rather than as a C string, and thus (possibly but not definitely) avoid `strlen` calls within the `find` call. Another thought is that since the search pattern is constant, a compiled-pattern approach may work faster (Boyer-Moore string search, for example), but that's cheating - unless e.g. JavaScript optimizers are much smarter than I'd expect. – Nov 29 '11 at 13:55
I tested a naive Boyer-Moore (building the table at each step) and it performed worse. The needle is very small (26 characters) compared to the size of the haystack (104448 characters), so the extra complexity balances the speed-up that could be expected. I guess that building the table outside could help... but maybe not as much as expected. – Matthieu M. Nov 29 '11 at 14:00
3

Stringstream will not give any performance improvements here. `std::string` is already mutable and can insert in constant amortized time. – Billy ONeal Dec 01 '11 at 07:49

score 7 · Answer 6 · answered Nov 29 '11 at 13:32

What you are missing here is the inherent complexity of the find search.

You are executing the search 102 * 1024 (104 448) times. A naive search algorithm will, each time, try to match the pattern starting from the first character, then the second, etc...

Therefore, you have a string that is going from length 1 to N, and at each step you search the pattern against this string, which is a linear operation in C++. That is N * (N+1) / 2 = 5 454 744 576 comparisons. I am not as surprised as you are that this would take some time...

Let us verify the hypothesis by using the overload of find that searches for a single A:

Original: 6.94938e+06 ms
Char    : 2.10709e+06 ms

About 3 times faster, so we are within the same order of magnitude. Therefore the use of a full string is not really interesting.

Conclusion ? Maybe that find could be optimized a bit. But the problem is not worth it.

Note: and to those who tout Boyer Moore, I am afraid that the needle is too small, so it won't help much. May cut an order of magnitude (26 characters), but no more.

There is no `A` in the hay-stack, so it should just check each character in the string that it is not found and not look at the other characters of the pattern. You seem to be describing `find_any_of` method, which again should find the `'X'` very fast here. — UncleBens, Nov 29 '11 at 17:11
@UncleBens: not at all, I am talking about `find`, which even for a string pattern should stop on the first character of the pattern if it does not match and move on in the haystack. The fact that looking for a single character `A` (the first character of the pattern) is only 3 times faster confirms my suspicion that it is not the pattern search that is slow, but simply that looking for a pattern in such a long string so many times is plain slow in itself. — Matthieu M., Nov 29 '11 at 20:32

score 4 · Answer 7 · answered Nov 29 '11 at 11:59

4

For C++, try to use std::string for "ABCDEFGHIJKLMNOPQRSTUVWXYZ" - in my implementation string::find(const charT* s, size_type pos = 0) const calculates length of string argument.

answered Nov 29 '11 at 11:59

dimba

26,717
34
141
196

score 3 · Answer 8 · answered Sep 04 '12 at 16:24

C/C++ language are not easy and take years make fast programs.

with strncmp(3) version modified from c version:

#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>

void test()
{
    int limit = 102 * 1024;
    char s[limit];
    size_t size = 0;
    while (size < limit) {
        s[size++] = 'X';
        if (!strncmp(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) {
            fprintf(stderr, "zomg\n");
            return;
        }
    }
    printf("x's length = %zu\n", size);
}

int main()
{
    test();
    return 0;
}

score 3 · Answer 9 · answered Nov 29 '11 at 12:15

I just tested the C++ example myself. If I remove the the call to std::sting::find, the program terminates in no time. Thus the allocations during string concatenation is no problem here.

If I add a variable sdt::string abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and replace the occurence of "ABC...XYZ" in the call of std::string::find, the program needs almost the same time to finish as the original example. This again shows that allocation as well as computing the string's length does not add much to the runtime.

Therefore, it seems that the string search algorithm used by libstdc++ is not as fast for your example as the search algorithms of javascript or python. Maybe you want to try C++ again with your own string search algorithm which fits your purpose better.

Well, if you remove string::find, this is just string concatenation, and this would be no much difference between languages/runtimes optimized for string: string in C++ is also much more optimized than in C (string as an array of char). string::find is not just a test for searching algorithm, but also a test for traversing string. I'll make another test. — Wu Shu, Nov 29 '11 at 14:48

score 2 · Answer 10 · answered Nov 29 '11 at 11:58

2

Your test code is checking a pathological scenario of excessive string concatenation. (The string-search part of the test could have probably been omitted, I bet you it contributes almost nothing to the final results.) Excessive string concatenation is a pitfall that most languages warn very strongly against, and provide very well known alternatives for, (i.e. StringBuilder,) so what you are essentially testing here is how badly these languages fail under scenarios of perfectly expected failure. That's pointless.

An example of a similarly pointless test would be to compare the performance of various languages when throwing and catching an exception in a tight loop. All languages warn that exception throwing and catching is abysmally slow. They do not specify how slow, they just warn you not to expect anything. Therefore, to go ahead and test precisely that, would be pointless.

So, it would make a lot more sense to repeat your test substituting the mindless string concatenation part (s += "X") with whatever construct is offered by each one of these languages precisely for avoiding string concatenation. (Such as class StringBuilder.)

answered Nov 29 '11 at 11:58

Mike Nakis

56,297
11
110
142

1

I have just checked the example code myself, and it turns out that almost all of the runtime is spend during string search. – swegi Nov 29 '11 at 12:07
o_O -- OK, then there is something totally weird going on. Prior to posting my answer I checked the documentation of all the find() and indexOf() methods in all of the above languages to make sure that they all perform straight non-regex, case-sensitive search. So, if search is the problem despite the triviality of the task, I do not know what to say. – Mike Nakis Nov 29 '11 at 12:19
Well, I checked only the C++ example, I think you are right for the really poor performance of the Java example. – swegi Nov 29 '11 at 12:21
@MikeNakis - a straight search can benefit from knowing the length of the search pattern - limiting the range where it searches and even dismissing the possibility of searching for a long pattern in a short string at all. A C string doesn't know it's own length, so either the `find` operation will use a linear-time `strlen` on the pattern for each call, or else it will work without that advanced knowledge of the length and do more work searching as a result. – Nov 29 '11 at 12:26
1

@swegi which languages did you check? I expect it may vary between them. With Python 2.7 the code as written takes 13.1s on my system, removing the `find` call it takes 0.019s. So the string concatenation (at least on Python) is the irrelevant part of the test. This is probably only true because the C version of Python uses reference counting and can do the concatenation in-place when it can detect that there is only one reference to the string. – Duncan Nov 29 '11 at 12:31
3

`std::string::operator+=` *is* the construct offered by C++ for avoiding the thing that in Java causes String concatenation to be slow. `std::string` is a mutable class, same as `StringBuilder`. TBH I think it's a bit confusing that the question is "why is C++ slow?", but includes Java results that are *waaay* slower, prompting various people to explain why the Java results are slow. Which is irrelevant to the question ;-) – Steve Jessop Nov 29 '11 at 12:43
@Steve314 I thought the same, but I tried searching for "A" instead of "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and there is no change in the running time. (Speaking of the C++ sample.) – Mike Nakis Nov 29 '11 at 12:53
@MikeNakis - actually, now I think about it, I'd expect that result. A straight string search for "ABC..." in a string of "XXX..." reduces to a straight search for 'A' (each check starting at a particular position fails at the first character - `'A' != 'X'`). This even raises questions about just how smart optimizers may be - find the right language and compiler and maybe all the searching will be optimized away (proven unnecessary) at compile-time. With Haskell and GHC, for example, you may find that the final executable just outputs a constant "x's length ..." string. – Nov 29 '11 at 14:02
@Steve314 I beg to differ; if that was possible, then the "Halting Problem" would be solvable. http://en.wikipedia.org/wiki/Halting_problem. – Mike Nakis Nov 29 '11 at 15:55
@MikeNakis - this program isn't all possible programs. Haskell may well do what I said, not by recognising a specific special case, but because there's no run-time specific data to block it from executing pretty much the whole program at compile-time. For the more specific optimisation, is it possible for a particular compiler to spot that the string to search will never contain an "A"? Yes. But it's an unlikely case in itself, except in that it might be a special case of a more general (but still not universal) optimisation. – Nov 29 '11 at 16:04
@MikeNakis - while I won't say that *will* happen, and especially not in any particular language, if you have studied Haskell to any depth it might change some of your assumptions about what is and what is not plausible. – Nov 29 '11 at 16:06
OK, I got your point, you pointed out my mistake when you said "this program isn't all possible programs". – Mike Nakis Nov 29 '11 at 17:08

score 1 · Answer 11 · answered Sep 12 '15 at 17:02

As mentioned by sbi, the test case is dominated by the search operation. I was curious how fast the text allocation compares between C++ and Javascript.

System: Raspberry Pi 2, g++ 4.6.3, node v0.12.0, g++ -std=c++0x -O2 perf.cpp

C++ : 770ms

C++ without reserve: 1196ms

Javascript: 2310ms

C++

#include <iostream>
#include <string>
#include <chrono>
using namespace std;
using namespace std::chrono;

void test()
{
    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    int x = 0;
    int limit = 1024 * 1024 * 100;
    string s("");
    s.reserve(1024 * 1024 * 101);
    for(int i=0; s.size()< limit; i++){
        s += "SUPER NICE TEST TEXT";
    }

    high_resolution_clock::time_point t2 = high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>( t2 - t1 ).count();
    cout << duration << endl;
}

int main()
{
    test();
}

JavaScript

function test()
{
    var time = process.hrtime();
    var x = "";
    var limit = 1024 * 1024 * 100;
    for(var i=0; x.length < limit; i++){
        x += "SUPER NICE TEST TEXT";
    }

    var diff = process.hrtime(time);
    console.log('benchmark took %d ms', diff[0] * 1e3 + diff[1] / 1e6 );
}

test();

score 1 · Answer 12 · edited Sep 19 '19 at 20:52

1

It seems that in nodejs there are better algorithms for substring search. You can implement it by yourself and try it out.

edited Sep 19 '19 at 20:52

BugShotGG

5,008
8
47
63

answered Nov 29 '11 at 11:53

xandox

336
1
2
9

Why do std::string operations perform poorly?

in Python (2.7.2):

in Java (OpenJDK-7):

in Javascript (Nodejs 0.6.3)

in C++ (g++ -Ofast)

Brief Summary

12 Answers12

Linked