2

I want to join thousands of strings in C++. There is no way that I know from before the size of the possible output string. Currently I join the strings using the '+' operator.

Unfortunately, this process takes a lot of time in my program.

In Java I would use StringJoiner which is much faster. Is there anything similar in C++?

I have read similar things online, and I have also read this question, but it is quite old (9 years old) and I imagine things have changed since.

andreas
  • 157
  • 13
  • 1
    You could write your own iterator to spawn the multiple strings, and use it to initialize a new string. If the iterator is random access, `std::string`'s constructor should be able to use `std::distance` to preallocate memory only once and then use the iterator to copy each `char` only once. – François Andrieux Apr 05 '18 at 15:10
  • "Currently I join the strings using the '+' operator." I am afraid this is not enough information. It is quite possible you simply doing something wrong. – Slava Apr 05 '18 at 15:14
  • Could you provide some context ? (How many are there, are they stored in a container ? if so, which one ? etc) – rak007 Apr 05 '18 at 15:27
  • @rak007 The number of strings that will be joined is actually configurable by the user of the application. It could be 100, 1000, or 100.000 or even a different number. I actually receive that many objects in a list from a remote server, which I later transform them into their string representation. And then I concatenate them with the string representation of all the other objects received. – andreas Apr 05 '18 at 15:33
  • @andreas that does not sound particularly optimal:( If it was me, I would be trying to write one buffer container directly with the chars from the transform operation? – Martin James Apr 05 '18 at 15:44
  • 1
    Use vector with std::back_inserter – seccpur Apr 05 '18 at 15:48
  • 1
    depending on how you want to do with the string later, maybe a [rope](https://en.wikipedia.org/wiki/Rope_(data_structure)#Comparison_with_monolithic_arrays) more suitable? And if you can ask the server to provide more information about the string before sending would be even more better. But I doubt that the network transfer is faster than the concatenation in most cases – phuclv Apr 05 '18 at 15:56
  • 1
    The fundamental issue is that memory needs to be reallocated for each concatenation, regardless of the container. – Thomas Matthews Apr 05 '18 at 15:56
  • @andreas To be clear, do you know the full set of strings to concatenate when you choose to join them, or do you have to continuously add new elements over time? – François Andrieux Apr 05 '18 at 15:59
  • Unless you tell use what the resulting string will be used for, you won't get a useful high-quality answer. My bet is that there's a way to do whatever you want without actually spending time up-front to concatenate the string. – Jeffrey Apr 05 '18 at 18:04
  • The simple answer is just "reserve a lot of space, and when it's all used, reserve much more space". I'm pretty sure this is what Java's StringJoiner does anyway. – Mooing Duck Apr 05 '18 at 18:08

2 Answers2

1

Consider using std::ostringstream defined in header file sstream.

You add data by using operator <<.

The final string you get by calling str().

Robert Andrzejuk
  • 5,076
  • 2
  • 22
  • 31
  • `*stringstream`s are horribly slow FWIW. Convenient but terrible. – Veedrac Apr 06 '18 at 15:17
  • @Veedrac As always the answer is "it depends". Benchmark: https://gist.github.com/robert-andrzejuk/036dce48eb2df4cdccda3cebe0ecf133 – Robert Andrzejuk Apr 07 '18 at 03:22
  • It depends on the lengths of your strings and the amount of strings and the machine, libraries used... . Always benchmark. – Robert Andrzejuk Apr 07 '18 at 03:41
  • That's a pretty misleading benchmark, since not only are you hiding the overhead, you're not showing that it's a factor of 5 slower than a simple variant that reserves memory. "Always benchmark" is a poor excuse for writing slow code. – Veedrac Apr 07 '18 at 04:13
  • @Veedrac What overhead am I hiding? The benchmark library will report the timing for the code within the for(...) loop. – Robert Andrzejuk Apr 07 '18 at 07:18
  • @Veedrac The question says that the size will be unknown upfront, so rezerving the memory is not an option. BUT I added this also to the benchmark code, and I don't know where the factor of 5 is? – Robert Andrzejuk Apr 07 '18 at 14:15
  • There is no way in heck that you would be reliably joining 1024 16kiB blocks and not have the opportunity to reserve space. If you're regularly joining 1024 things, you *clearly* do not want to start with a tiny buffer. If the things you are adding are 16kiB large, the overhead of collecting them into an array before you join them will be trivial. – Veedrac Apr 07 '18 at 15:14
  • As to the factor-5, it seems the speed of `ostringstream` is varying wildly depending on what *other* code is commented out. Yet another reason to avoid using "just benchmark it" to avoid using logical reasoning. Regardless, whatever the lottery gives you, not reserving is much slower than reserving. – Veedrac Apr 07 '18 at 15:24
-1

You could use a sstringstream (std::sstringstream) . See the documentation about it on cppref

You could also use boost to concat strings or transform list into strings using boost::algorithm::join but that would be overkill depending on your projet

rak007
  • 973
  • 12
  • 26
  • I think you mean std::stringstream? This is what I would usually use, though looking at std::ostringstream as suggested in the other answer that might be a better choice if you don't need it to also behave like an istream (std::stringstream inherits from both ostream and istream). – Sean Burton Apr 06 '18 at 10:07