Disclaimer: This problem is more of a theoretical, rather than a practical interest. I want to find out various different ways of doing this, with speed as icing on the new year cake.
The Problem
I want to be able to store a list of strings, and be able to quickly combine them into 1 if needed.
In short, I want to condense a structure (currently a std::vector<std::string>
) that looks like
["Hello, ", "good ", "day ", " to", " you!"]
to
["Hello, good day to you!"]
- Is there any idiomatic way to achieve this, ala python's
[ ''.join(list_of_strings) ]
? - What is the best way to achieve this in C++, in terms of time?
Possible Approaches
The first idea I had is to
- loop over the vector,
- append each element to the first,
- simultaneously delete the element.
We will be concatenating with +=
and reserve()
. I assume that max_size()
will not be reached.
Approach 1 (The Greedy Approach)
So called because it ignores conventions and operates in-place.
#if APPROACH == 'G' // Greedy Approach void condense(std::vector< std::string >& my_strings, int total_characters_in_list) { // Reserve the size for all characters, less than max_size() my_strings[0].reserve(total_characters_in_list); // There are strings left, ... for(auto itr = my_strings.begin()+1; itr != my_strings.end();) { // append, and... my_strings[0] += *itr; // delete, until... itr = my_strings.erase(itr); } } #endif
Now I know, you would say that this is risky and bad. So:
- loop over the vector,
- append each element to another
std::string
, - clear the vector and make the string first element of the vector.
Approach 2 (The "Safe" Haven)
So called because it does not modify the container while iterating over it.
#if APPROACH == 'H' // Safe Haven Approach void condense(std::vector< std::string >& my_strings, int total_characters_in_list) { // Store the whole vector here std::string condensed_string; condensed_string.reserve(total_characters_in_list); // There are strings left... for(auto itr = my_strings.begin(); itr != my_strings.end(); ++itr) { // append, until... condensed_string += *itr; } // remove all elements except the first my_strings.resize(1); // and set it to condensed_string my_strings[0] = condensed_string; } #endif
Now for the standard algorithms...
Using std::accumulate
from <algorithm>
Approach 3 (The Idiom?)
So called simply because it is a one-liner.
#if APPROACH == 'A' // Accumulate Approach void condense(std::vector< std::string >& my_strings, int total_characters_in_list) { // Reserve the size for all characters, less than max_size() my_strings[0].reserve(total_characters_in_list); // Accumulate all the strings my_strings[0] = std::accumulate(my_strings.begin(), my_strings.end(), std::string("")); // And resize my_strings.resize(1); } #endif
Why not try to store it all in a stream?
Using std::stringstream
from <sstream>
.
Approach 4 (Stream of Strings)
So called due to the analogy of C++'s streams with flow of water.
#if APPROACH == 'S' // Stringstream Approach void condense(std::vector< std::string >& my_strings, int) // you can remove the int { // Create out stream std::stringstream buffer(my_strings[0]); // There are strings left, ... for(auto itr = my_strings.begin(); itr != my_strings.end(); ++itr) { // add until... buffer << *itr; } // resize and assign my_strings.resize(1); my_strings[0] = buffer.str(); } #endif
However, maybe we can use another container rather than
std::vector
?
In that case, what else?
(Possible) Approach 5 (The Great Indian "Rope" Trick)
I have heard about the rope data structure, but have no idea if (and how) it can be used here.
Benchmark and Verdict:
Ordered by their time efficiency (currently and surprisingly) is1:
Approaches Vector Size: 40 Vector Size: 1600 Vector Size: 64000
SAFE_HAVEN: 0.1307962699997006 0.12057728999934625 0.14202970000042114
STREAM_OF_STRINGS: 0.12656566000077873 0.12249500000034459 0.14765803999907803
ACCUMULATE_WEALTH: 0.11375975999981165 0.12984520999889354 3.748660090001067
GREEDY_APPROACH: 0.12164988000004087 0.13558526000124402 22.6994204800023
timed with2:
NUM_OF_ITERATIONS = 100
test_cases = [ 'greedy_approach', 'safe_haven' ]
for approach in test_cases:
time_taken = timeit.timeit(
f'system("{approach + ".exe"}")',
'from os import system',
number = NUM_OF_ITERATIONS
)
print(approach + ": ", time_taken / NUM_OF_ITERATIONS)
Can we do better?
Update: I tested it with 4 approaches (so far), as I could manage in my little time. More incoming soon. It would have been better to fold the code, so that more approaches could be added to this post, but it was declined.
1 Note that these readings are only for a rough estimate. There are a lot of things that influence the execution time, and note that there are some inconsistencies here as well.
2 This is the old code, used to test only the first two approaches. The current code is a good deal longer, and more integrated, so I am not sure I should add it here.
Conclusions:
- Deleting elements is very costly.
- You should just copy the strings somewhere, and resize the vector.
- Infact, better reserve enough space too, if copying to another string.