std::string performance for handling short strings

Question

Bjarne Stroustrup and other experts said that C++ is faster than C for handling for short strings in Bjarne Stroustrup' article and my previous question

But in my test, C++ was about 110% slower than C.

g++ version is 4.4.6 (runs on CentOS 6.3). Is this because g++ 4.4.6 has less c++11 feature such as Rvalue Reference (move semantics)?

Test Result

output of $ time a.out input_file minus execution time of no calling compose_X() function

cpp version : 0.192 sec
C version : 0.091 sec

source code

compiled with -O2

Edit

compose_cpp() and compose_p() come from Bjarne's article. He said that compose_cpp() is fater than compose_p(). I want to check this fact with real test.

If I have tested wrong way, how can I improve test?

#include <iostream>
#include <fstream>

#include <cstdlib>
#include <cstring>

std::string compose_cpp(const std::string& name, const std::string& domain)
{
    return name + '@' + domain;
}

char* compose_c(const char* name, const char* domain)
{
    char* res = (char*) malloc(strlen(name)+strlen(domain)+2);
    char* p = strcpy(res,name);

    p += strlen(name);
    *p = '@';
    strcpy(p+1,domain);

    return res;
}

int main(int argc, char* argv[])
{
    std::ifstream ifs;
    ifs.open(argv[1]);

    std::string email, domain;

    while (ifs.good())
    {
        ifs >> email;
        ifs >> domain;

        // std::string composed = compose_cpp(email, domain);

        char* composed = compose_c(email.c_str(), domain.c_str());
        free(composed);
    }

    ifs.close();
}

input file

input file is 1 millon lines long. every line is less than 20 bytes, generated randomly.

$ head -n 10 input.txt.1m
9742720 1981857.com
22504 4127435.com
342760 69167.com
53075 26710.com
3837481 1851920.com
98441 278536.com
4503887 9588108.com
193947 90885.com
42603 8166125.com
3587671 297296.com

Your C code allocates exactly the correct amount of space, but your C++ code has to guess. — Dietrich Epp, Dec 24 '14 at 07:04
I think some people don't like language comparison questions. However, this question should be fine. — Dietrich Epp, Dec 24 '14 at 07:06
@DietrichEpp Ok. If all strings are less than 20 characters, Can I improve c++ code? — Jason Heo, Dec 24 '14 at 07:07
I don't get it, what the hell are you benchmarking? File I/O? By running one cold and one hot? — Blindy, Dec 24 '14 at 07:07
@InoSHeo I asked before "how you are measuring" and you seem to be ignoring my question. That's integral to others reproducing the same results. — , Dec 24 '14 at 07:07
@Blindy In my test code, both c and c++ have same I/O cost. test has taken several times, result is average. — Jason Heo, Dec 24 '14 at 07:08
@InoSHeo: However, if you run one benchmark first and the other benchmark afterwards, the second one will run with a warm cache. — Dietrich Epp, Dec 24 '14 at 07:09
@Blindy Would you please test with MSVC and share results? MSVS seems to have better short string optimization than `g++`. I don't have MSVC right now. — Jason Heo, Dec 24 '14 at 08:01
@Blindy Where have gone your comments? Can I ask you something? If you familiar with C++ (as you said in profile), what do you think about Bjarne's article? Which is faster and why? and any test result. please give me some knowlegdes. — Jason Heo, Dec 24 '14 at 12:12

score 2 · Accepted Answer · edited May 23 '17 at 10:33

2

I'm just going to put a guess here because I don't have the data file to test with. I think your results may not match Stroustrup's expectation because of what he says here:

Yes, the C++ version, because it does not have to count the argument characters and does not use the free store (dynamic memory) for short argument strings.

However, my understanding is that libstdc++ uses dynamic memory for all strings (except for zero length strings). See this recent SO answer to a question about the small size of the std::string object in libstdc++: https://stackoverflow.com/a/27631366/12711

It's possible you would have better results with an implementation that uses the short string optimization (like MSVC - I'm not sure if clang's libc++ uses it or not).

edited May 23 '17 at 10:33

Community

1
1

answered Dec 24 '14 at 07:31

Michael Burr

333,147
50
533
760

I'd like to add that short string optimizations make long strings perform worse (causing branch mispredictions and making data locality worse when strings are longer). – Dietrich Epp Dec 24 '14 at 07:46
1

Also I think that MSVC uses the short string optimization for strings less than 16 characters, so your test may still run into memory allocation overhead in the C++ case. – Michael Burr Dec 24 '14 at 07:49
I'll keep in mind *16 characters* if I test with MSVC. Thanks. – Jason Heo Dec 24 '14 at 07:51
1

@DietrichEpp: are you sure of that? I'd think that the dynamic allocation would completely dominate any branch prediction hit (which should largely happen only when memory is potentially being allocated or deallocated), and as far as data locality goes - whether or not the short string optimization is used, long strings will be stored somewhere on the heap so the locality hit would be the same. – Michael Burr Dec 24 '14 at 07:54
In the libc++ `std::string` implementation, a string is three words. One byte distinguishes short from long words, and that *has* to be examined whenever you want to access character data. If some strings are short and some long, you get branch mispredictions. You're right about cache locality not being a problem, but it's for a different reason--strings would be three words anyway, whether or not you used short string optimization. – Dietrich Epp Dec 24 '14 at 10:45
However, comparing `char *` to `std::string`, the `char *` version will have better locality for heap-allocated strings because the objects containing strings will be smaller. Not advocating `char *` here, just noting that it is more efficient with memory, even if both versions use the heap. – Dietrich Epp Dec 24 '14 at 10:48

Jerry Coffin · Answer 2 · 2014-12-24T12:53:15.610

I took the liberty of expanding your test program out a little bit. In particular, I added code to have it generate its data internally instead of depending on an outside file, added timing code to isolate the string handling in question, and had it do both the C++ and C versions of the string manipulation in the same run, so it immediately produced results I could compare. That gave me the following code:

#include <iostream>
#include <fstream>
#include <vector>
#include <string>

#include <cstdlib>
#include <cstring>

#include <ctime>

char* compose_c(const char* name, const char* domain)
{
    char* res = (char*) malloc(strlen(name)+strlen(domain)+2);
    char* p = strcpy(res,name);

    p += strlen(name);
    *p = '@';
    strcpy(p+1,domain);

    return res;
}

std::string rand_string(int size){
    std::string ret;

    for (int i = 0; i < size; i++)
        ret.push_back(rand() % 10 + '0');
    return ret;
}

struct address {
    std::string email, domain;

    address() : email(rand_string(5)), domain(rand_string(4) + ".com") { }
};

struct composed {
    std::string addr;
    composed(address const &a) : addr(a.email + "@" + a.domain) {}
};

void report(clock_t d, std::string const &label){
    std::cout << double(d) / CLOCKS_PER_SEC << " seconds for " << label << "\n";
}

int main(int argc, char **argv) {
    static const int NUM = 1024 * 1024;

    std::vector<address> addresses(NUM);

    clock_t start = clock();
    {
        std::vector<composed> c{ addresses.begin(), addresses.end() };
    }
    report(clock() - start, "C++");

    std::vector<char *> c_results(addresses.size());

    clock_t start_c = clock();
    for (int i = 0; i < addresses.size(); i++)
        c_results[i] = compose_c(addresses[i].email.c_str(), addresses[i].domain.c_str());
    for (char *c : c_results)
        free(c);
    report(clock() - start_c, "C");
}

I then compiled that with VC++ 2013, x64 using the flags: -O2b2 -GL -EHsc. When I ran that, the result I got was:

0.071 seconds for C++
0.12 seconds for C

While there's some variation from run to run, those are fairly representative results--the C code takes almost (but not quite) twice as long as the C++ code.

Note that this is despite the fact that I've actually given the C version a bit of an unfair advantage. The timing for the C++ code includes not only the time to do the string manipulation, but also the time to create and destroy the vector to hold the results. For the C code, I prepare the vector ahead of time, then time only the code to create and destroy the strings themselves.

To ensure this result wasn't a fluke, I also tried changing some of the independent variables. For example, increasing the number of address strings we compose by a factor of 10 increased the overall time, but had almost no effect on the ratio:

0.714 seconds for C++
1.206 seconds for C

Likewise, changing the order so the C code ran first, and the C++ code ran second had no discernible effect.

I suppose I should add: it's true that as it stands, this code doesn't actually use the compose_cpp function as the original did, choosing to incorporate the functionality into the constructor for composed instead. For the sake of completeness, I did write a version that used compose_cpp, like this:

std::vector<std::string> composed;
composed.reserve(NUM);

clock_t start = clock();

for (auto const &a : addresses)
    composed.push_back(compose_cpp(a.email, a.domain));

This actually improves the timing a little bit, but I'd guess it's mostly due to moving the time to create the vector itself out of the timing code, and doesn't make a big enough difference to care much about:

0.631 seconds for C++
1.21 seconds for C

These results do depend heavily upon the standard library implementation though--specifically, the fact that std::string implements the short-string optimization. Running the same code on the same hardware, but using an implementation that lacks this optimization (the nuwen MinGW distribution of gcc 4.9.1, in my case) gives quite different results:

2.689 seconds for C++
1.131 seconds for C

In this case, the C code is a little faster than the code from VC++, but the C++ code has slowed by a factor of about 4. I tried some different compiler flags (-O2 vs. -O3, etc.) but they had only minimal effect--for this test, the lack of short string optimization clearly dominates the other factors.

Bottom line: I think this confirms that the C++ code can be substantially faster than the C code, but achieving that speed depends much more on the quality of implementation. If the implementation fails to provide the short string optimization, the C++ code can easily be 2x slower instead of 2x faster than the C version.

Thank you for sharing various test results! Merry christmas ;) — Jason Heo, Dec 24 '14 at 12:48
Then, what's 'wrong' with my test in [this gist](https://gist.github.com/martinmoene/bd290bc0ff8248de31fd) that shows VC13's C version is quicker than its C++ version with a short string of 9 characters? — Martin Moene, Dec 30 '14 at 11:58
I don't know if this is still the case, but msvc used a checked STL the last time I used it (this was with vc++2010, so things may be different know). In this case, the C++ code includes bounds-checking and other sanity checks, but the C code doesn't. — Jens, Jan 15 '18 at 07:57

std::string performance for handling short strings

2 Answers2

Linked