2

I wrote the following string concatenation function (join) to reduce the number of allocations and the time spent for constructing the final string. I also wanted to write an easy to use appending function (one-liner if possible).

size_t str_size(const char *str) {
    return std::strlen(str);
}

size_t str_size(const std::string &str) {
    return str.size();
}

template <typename T>
size_t accumulated_size(const T& last) {
    return str_size(last);
}

template <typename T, typename... Args>
size_t accumulated_size(const T& first, const Args& ...args) {
    return str_size(first) + accumulated_size(args...);
}

template <typename T>
void append(std::string& final_string, const T &last) {
    final_string += last;
}

template <typename T, typename... Args>
void append(std::string& final_string, const T& first, const Args& ...args) {
    final_string += first;
    append(final_string, args...);
}

template <typename T, typename... Args>
std::string join(const T& first, const Args& ...args) {
    std::string final_string;

    final_string.reserve(accumulated_size(first, args...));
    append(final_string, first, args...);

    return std::move(final_string);
}

I tested the join method against typical built-in C++ concatenation functionality using the operator+= and also the operator+ of the std::string class on a fairly large amount of strings. How and why is my method yielding poorer results in terms of time execution compared to the plain operator+= or operator+ approach?

I'm using the following class to measure the time:

class timer {
public:
    timer() {
        start_ = std::chrono::high_resolution_clock::now();
    }

    ~timer() {
        end_ = std::chrono::high_resolution_clock::now();
        std::cout << "Execution time: " << std::chrono::duration_cast<std::chrono::nanoseconds>(end_ - start_).count() << " ns." << std::endl;
    }

private:
    std::chrono::time_point<std::chrono::high_resolution_clock> start_;
    std::chrono::time_point<std::chrono::high_resolution_clock> end_;
};

I'm comparing the following way:

#define TEST_DATA "Lorem", "ipsum", "dolor", "sit", "ame", "consectetuer", "adipiscing", "eli", "Aenean",\
                    "commodo", "ligula", "eget", "dolo", "Aenean", "mass", "Cum", "sociis", "natoque",\
                    "penatibus", "et", "magnis", "dis", "parturient", "monte", "nascetur", "ridiculus",\
                    "mu", "Donec", "quam", "feli", ", ultricies", "ne", "pellentesque", "e", "pretium",\
                    "qui", "se", "Nulla", "consequat", "massa", "quis", "eni", "Donec", "pede", "just",\
                    "fringilla", "ve", "aliquet", "ne", "vulputate", "ege", "arc", "In", "enim", "just",\
                    "rhoncus", "u", "imperdiet", "", "venenatis", "vita", "just", "Nullam", "ictum",\
                    "felis", "eu", "pede", "mollis", "pretiu", "Integer", "tincidunt"

#define TEST_DATA_2 std::string("Lorem") + "ipsum"+ "dolor"+ "sit"+ "ame"+ "consectetuer"+ "adipiscing"+ "eli"+ "Aenean"+\
                    "commodo"+ "ligula"+ "eget"+ "dolo"+ "Aenean"+ "mass"+ "Cum"+ "sociis"+ "natoque"+\
                    "penatibus"+ "et"+ "magnis"+ "dis"+ "parturient"+ "monte"+ "nascetur"+ "ridiculus"+\
                    "mu"+ "Donec"+ "quam"+ "feli"+ ", ultricies"+ "ne"+ "pellentesque"+ "e"+ "pretium"+\
                    "qui"+ "se"+ "Nulla"+ "consequat"+ "massa"+ "quis"+ "eni"+ "Donec"+ "pede"+ "just"+\
                    "fringilla"+ "ve"+ "aliquet"+ "ne"+ "vulputate"+ "ege"+ "arc"+ "In"+ "enim"+ "just"+\
                    "rhoncus"+ "u"+ "imperdiet"+ ""+ "venenatis"+ "vita"+ "just"+ "Nullam"+ "ictum"+\
                    "felis"+ "eu"+ "pede"+ "mollis"+ "pretiu"+ "Integer"+ "tincidunt"

int main() {
    std::string string_builder_result;
    std::string normal_approach_result_1;
    std::string normal_approach_result_2;

    {
        timer t;
        string_builder_result = join(TEST_DATA);
    }

    std::vector<std::string> vec { TEST_DATA };
    {
        timer t;
        for (const auto & x : vec) {
            normal_approach_result_1 += x;
        }
    }

    {
        timer t;
        normal_approach_result_2 = TEST_DATA_2;
    }
}

My results are:

  • Execution time: 11552 ns (join approach).
  • Execution time: 3701 ns (operator+=() approach).
  • Execution time: 5898 ns (operator+() approach).

I'm compiling with: g++ efficient_string_concatenation.cpp -std=c++11 -O3

cpplearner
  • 13,776
  • 2
  • 47
  • 72
cuv
  • 1,159
  • 2
  • 10
  • 20
  • Please include your tests, including compiler flags and data used. There is a long tradition of poorly done profiling in performance C++ testing here. – Yakk - Adam Nevraumont Sep 06 '17 at 10:48
  • 1
    You should be using steady_clock, not high_resolution_clock for timing. Are you not compiling with -O3 turned on? – 0xBADF00 Sep 06 '17 at 11:27
  • I just did compile with -O3. It actually improves but still not better than the other approach. Results: Execution time: 8949 ns. (join approach), Execution time: 3475 ns. (operator += approach) – cuv Sep 06 '17 at 11:41
  • I am not an expert on this, but your using a recursive call in your append function. My guess is that it will impact your performance compared to looping over a vector – 0xBADF00 Sep 06 '17 at 11:49
  • @0xBADF00 you are right, I am also pre-calculating the size of the resulting string which makes a bunch of calls to `std::strlen` which also eats more time. it's weird that this also performs worse than the alternative using the `operator+()` of the std::string which makes a lot of allocations – cuv Sep 06 '17 at 12:10
  • With `normal_approach_result_1`, you have moved all `strlen` calls outside the timer, whereas the two other approaches include them in the measurement. Also, `operator+` in your case is no less efficient than `operator+=`. There's still one buffer, growing exponentially, into which all strings are accumulated; this buffer is handed over from temporary to temporary via move. So I think it mostly comes down to `strlen` - the fastest approach doesn't do them, the second fastest does them once, the slowest does them twice. – Igor Tandetnik Sep 06 '17 at 14:18
  • 2
    Avoid moving the returned string, it prevents RVO. See e.g. https://stackoverflow.com/questions/19267408/why-does-stdmove-prevent-rvo – Bob__ Sep 06 '17 at 14:30
  • to understand your code performance you need a profiler. Valgrind is your default choice on Linux, VisualStudio Profiler on Windows. It takes a little bit to get used to them, but it's one of the best time investment for a programmer – Andriy Tylychko Sep 06 '17 at 14:38

2 Answers2

4

operator+ has an rvalue reference left hand side std::string. The += code you wrote isn't fundamentally better than a long chain of +.

Its + can use exponential reallocation, starting around 10 or so. At a 1.5 growth factor that is about 9 allocations and reallocations.

The recursion could be confusing or slowing things down. You can fix this:

template <typename... Args>
void append(std::string& final_string, const Args&... args) {
  using unused=int[];
  (void)unused{0,(void(
    final_string += args
  ),0)...};
}

which eliminates that recursion, and similarly:

template <typename... Args>
size_t accumulated_size(const Args& ...args) {
  size_t r = 0;
  using unused=int[];
  (void)unused{0,(void(
    r += str_size(args)
  ),0)...};
  return r;
}

but, it may not be worth visiting that many strings to calculate their length to save 8 reallocations.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • That `(void)unused={...};` trick can be replaced by `((final_string += args), ...);` using C++17 fold expressions, am I right? – Bob__ Sep 06 '17 at 16:36
  • 1
    @Bob__ yes, but this is C++11 – Yakk - Adam Nevraumont Sep 06 '17 at 16:38
  • What's this "trick" you used to get rid of recursion ? I am not familiar at all with it, first time I encounter it. Can you point me to a resource where I can read more about it ? – cuv Sep 07 '17 at 06:48
  • @cuvidk look at https://stackoverflow.com/questions/28110699/what-does-this-variadic-template-code-do or https://stackoverflow.com/a/17340003/4944425 – Bob__ Sep 07 '17 at 11:44
  • @Yakk removing the recursion and cutting out the computation regarding the accumulated size of strings yields more or less the same results. This is weird. I expected at least an improvement that can overcome the results of the `std::string::operator+` method. This current technique is mostly the same with the `std::string::operator+=` method in my eyes and still results in poorer execution time. – cuv Sep 08 '17 at 06:39
  • @cuvidk order of test is next; mayhap 1st one pays to load strings into cpu cache. – Yakk - Adam Nevraumont Sep 08 '17 at 11:25
  • @Yakk this is amazing.. very interesting – cuv Sep 08 '17 at 11:39
-1

Please do not deal with strings for that.

Use string streams, or create your own StringBuilder like this one: https://www.codeproject.com/Articles/647856/Performance-Improvement-with-the-StringBuilde

Specialized string builders recommended due to its intelligent allocations management (sometimes they support list of string chunks, sometimes - prediction of growth). Allocations are hard and very time consuming operations.

Akzhan Abdulin
  • 993
  • 6
  • 8
  • This question is looking for an *explanation*, not simply for a recommendation. Your answer provides no insight for the questioner, and may be deleted. Please [edit] to explain what causes the observed symptoms. – Toby Speight Sep 06 '17 at 14:53