24

I was trying to pick a standard way to convert integrals to strings, so I went on and did a small performance evaluation by measuring the execution time of 3 methods

#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <chrono>
#include <random>
#include <exception>
#include <type_traits>
#include <boost/lexical_cast.hpp>

using namespace std;

// 1. A way to easily measure elapsed time -------------------
template<typename TimeT = std::chrono::milliseconds>
struct measure
{
    template<typename F>
    static typename TimeT::rep execution(F const &func)
    {
        auto start = std::chrono::system_clock::now();
        func();
        auto duration = std::chrono::duration_cast< TimeT>(
            std::chrono::system_clock::now() - start);
        return duration.count();
    }
};
// -----------------------------------------------------------

// 2. Define the conversion functions ========================
template<typename T> // A. Using stringstream ================
string StringFromNumber_SS(T const &value) {
    stringstream ss;
    ss << value;
    return ss.str();
}

template<typename T> // B. Using boost::lexical_cast =========
string StringFromNumber_LC(T const &value) {
    return boost::lexical_cast<string>(value);
}

template<typename T> // C. Using c++11 to_string() ===========
string StringFromNumber_C11(T const &value) {
    return std::to_string(value);
}
// ===========================================================

// 3. A wrapper to measure the different executions ----------
template<typename T, typename F>
long long MeasureExec(std::vector<T> const &v1, F const &func)
{
    return measure<>::execution([&]() {
        for (auto const &i : v1) {
            if (func(i) != StringFromNumber_LC(i)) {
                throw std::runtime_error("FAIL");
            }
        }
    });
}
// -----------------------------------------------------------

// 4. Machinery to generate random numbers into a vector -----
template<typename T>
typename std::enable_if<std::is_integral<T>::value>::type 
FillVec(vector<T> &v)
{
    std::mt19937 e2(1);
    std::uniform_int_distribution<> dist(3, 1440);
    std::generate(v.begin(), v.end(), [&]() { return dist(e2); });
}

template<typename T>
typename std::enable_if<!std::is_integral<T>::value>::type 
FillVec(vector<T> &v)
{
    std::mt19937 e2(1);
    std::uniform_real_distribution<> dist(-1440., 1440.);
    std::generate(v.begin(), v.end(), [&]() { return dist(e2); });
}
// -----------------------------------------------------------

int main()
{
    std::vector<int> v1(991908);
    FillVec(v1);

    cout << "C++ 11 method ......... " <<
        MeasureExec(v1, StringFromNumber_C11<int>) << endl;
    cout << "String stream method .. " <<
        MeasureExec(v1, StringFromNumber_SS<int>) << endl;
    cout << "Lexical cast method ... " <<
        MeasureExec(v1, StringFromNumber_LC<int>) << endl;

    return 0;
}

A typical output (running Release in VS2013 which implies /O2 optimization flag) would be

C++ 11 method ......... 273

String stream method .. 1923

Lexical cast method ... 222

UPDATE

Alternatively an online run on gcc with

g++ -std=c++11 -Ofast -march=native -Wall -pedantic main.cpp && ./a.out

C++ 11 method ......... 414

String stream method .. 1538

Lexical cast method ... 275

Disclaimer : Results are to be compared among each other and not across machines

Questions

1. Why is the string stream method consistently the worst (by an order of magnitude)? Should it be viewed as deprecated now that faster alternatives emerged?

2. Why is lexical cast consistently the best? Can we assume that this is the fastest implementation?

Please feel free to tweak and play with your versions of this code. I'd appreciate your insights on the topic.

PS

The code that was actually run, had only one measurement per main(). Here all were 3 were presented together to save space.

Optimization flags are compiler specific or application mandated. I'm just providing the code blocks to perform the tests and expect from SO users to chip in with their results or suggestions to what the optimum configuration per compiler would be (for what it's worth I provided the flags used here).

The code works for any numeric to string conversion (it takes changing the type of v1 in main). sehe did for double (mentioned in his answer's comment). It's a good idea to play with that too.

Nikos Athanasiou
  • 29,616
  • 15
  • 87
  • 153
  • what were your compilation options? Any optimization flags? – Red Alert May 02 '14 at 22:34
  • nope, none. But as I said, I'd expect others to copy the code and tweak their own settings to disprove/confirm my results – Nikos Athanasiou May 02 '14 at 22:36
  • Could you please repost the code as single block? The sentences in between can be comments. Also include all the headers involved. As it stands, I'm not very motivated to do any testing of my own because of all the work involved. – Praetorian May 02 '14 at 22:39
  • 1
    My Visual Studio 2012 release build results: C++ 11 method 170 String stream method 906 Lexical cast method 126 – Matthias247 May 02 '14 at 22:52
  • 1
    Test the stream method with 'std::ios_base::sync_with_stdio(0);' line before and paste the time on your machine, please. – enedil May 02 '14 at 22:58
  • I'd be interested in comparing these against the timing of a straightforward "old fashioned" plain C-like algorithm as well. (Maybe I'm going to test that myself.) It would necessarily loose the templating that C++ allows, but then again this tests for int conversion only anyway. – Jongware May 02 '14 at 23:02
  • @Jongware It's all tweakable. Eg you can change `v1` to `std::vector` and it'll work. As for the "old fashioned" way I believe we'd all be glad to have a benchmark against that as well (even though I'd bet my money on `lexical_cast` instead of `sprintf`) – Nikos Athanasiou May 02 '14 at 23:12
  • 2
    So you drop the result of the conversion? I.e. there's no observable behaviour? – dyp May 02 '14 at 23:23
  • profiling unoptimized code isn't that relevant. What runs fastest with no optimization may not run so fast with optimization. For lexical_cast, you can read the code...it's all header. I have an ancient version of boost that which looks like it uses stringstream like you, but uses string::swap, presumably to avoid a temporary. – Gort the Robot May 02 '14 at 23:23
  • 4
    `std::stringstream` is locale-aware, `to_string` is not. – dyp May 02 '14 at 23:25
  • There is quite some potential for optimization of int -> string conversions, especially if you restrict the formatting "options". For example, see [Alexandrescu's Three Optimizations Tips for C++](https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920) – dyp May 02 '14 at 23:32
  • 7
    Wait, you profiled **without optimization flags**? Why would you bother doing that, and why would you think that the result would be of interest to other people? – Yakk - Adam Nevraumont May 03 '14 at 00:30
  • Anyways, I answered both your questions. Bear in mind, ---silly--- synthetic benchmarks be silly. – sehe May 03 '14 at 01:37
  • 1
    @Yakk I'm providing an initial body of code to do such a measuring. If I had everything figured out this wouldn't be a question... I'd blog about it. A "don't bother others" case would be if I'd plainly asked which is the best (without me doing any coding); asking from other people to tweak optimization flags is not much to ask (or at least something should be asked). All this bitching makes me wish we were more like the Python community ... I mean seriously not even ONE good/encouranging/mentor like comment, not ONE. – Nikos Athanasiou May 03 '14 at 06:41
  • 1
    @NikosAthanasiou: Evaluating the performance of completely unoptimized C++ code is kind of pointless and stating which optimization level you used is not tweaking. In your question you say "release version" but in a comment you say "nope, none" so it's not clear what you're measuring. I don't think Yakk's concern qualifies as bitching. – Blastfurnace May 03 '14 at 08:24
  • @NikosAthanasiou: Your edit makes it clearer that you are measuring an optimized build. Thanks. It would have avoided some of the comments in the first place. By the way, the `/Ox` timings don't add much to your question and I would call them unnecessary "tweaking" :) – Blastfurnace May 03 '14 at 10:05
  • Thanks for the work and source. Could you please edit your question to give us the version of Boost you used for this test? – paercebal May 12 '14 at 10:12

3 Answers3

34

Question 1. Why is the string stream method consistently the worst?

The classical mistake: creating a new stringstream every single time

template<typename T> // 1. Using stringstream
string StringFromIntegral_SS(T const &value) {
    thread_local stringstream ss;
    ss.str("");
    ss.clear();
    ss << value;
    return ss.str();
}

Question 2. Why is lexical cast consistently the best? Can we assume that this is the fastest implementation ?

Because it's most specialized; and, no, faster implementations exist. FastFormat and Boost Spirit have competitive offerings, as far as I know.

Update Boost Spirit Karma still easily beats the bunch:

template<typename T> // 4. Karma to string
std::string StringFromIntegral_K(T const &value) {
    thread_local auto const gen = boost::spirit::traits::create_generator<T>::call();
    thread_local char buf[20];
    char* it = buf;
    boost::spirit::karma::generate(it, gen, value);
    return std::string(buf, it);
}

Timings:

C++ 11 method 111
String stream method 103
Lexical cast method 57
Spirit Karma method 36
Spirit Karma method with string_ref 13

See it Live On Coliru Clang or GCC


BONUS

Just to goof off, a version using boost::string_ref is much faster still due the reduced allocations:

template<typename T> // 5. Karma to string_ref
boost::string_ref StringFromIntegral_KSR(T const &value) {
    thread_local auto const gen = boost::spirit::traits::create_generator<T>::call();
    thread_local char buf[20];
    char* it = buf;
    boost::spirit::karma::generate(it, gen, value);
    return boost::string_ref(buf, it-buf);
}

I've tested all modified methods for correctness using an asserting test loop:

return measure<>::execution(
    //[&]() { for (auto const &i : v1) { func(i); }});
    [&]() { for (auto const &i : v1) { assert(func(i) == StringFromIntegral_LC(i)); }});
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added another candidate implementation, that is faster still than the Karma version. It cloks in at ~4x the lexical cast. _[Silly Benchmark Disclaimer]_. Of course, any real wins depend on the real usage patterns. – sehe May 03 '14 at 02:06
  • +1 Excellent. About the `stringstream` mistake, even the implementation [linked from isocpp](http://codexpert.ro/blog/2014/04/14/standard-way-of-converting-between-numbers-and-strings-in-cpp11/) does it. It's nice to finally see the **correct** way. As for the asserting loop, it solves the problem mentioned by @dyp on observable behaviour right? – Nikos Athanasiou May 03 '14 at 06:50
  • I wouldn't say this is the correct way:) instead, I'd use one of the existing implementations - std::to_string, most likely - until my profiler tells me otherwise. And then I'd not use string_stream except for concatenating more than one item of data. – sehe May 03 '14 at 07:12
  • Results when executed with msvs2017; C++ 11 method 26, String stream method 148, Lexical cast method 203, Spirit Karma method 28, Spirit Karma method with string_VIEW 21 – boojum Mar 15 '18 at 06:34
  • @boojum [MSVC 19.00.23506 for x64 live](http://rextester.com/CHVFTR67651): C++ 11 method 222 String stream method 197 Lexical cast method 258 Spirit Karma method 42 Spirit Karma method with string_ref 23 (Note the first!) – sehe Mar 15 '18 at 08:01
1

std::format

A new method has been added to our arsenal with c++20, namely that of std::format. Obtaining a string from a number num would be as easy as:

std::format("{}", num);

Since gcc does not support it yet, I extended the original benchmark using fmt:

fmt::format("{}", num);

i.e. the library std::format is based on (and will probably be the bulk of upcoming implementations), in this online demo. Dissapointed to see it's 4x slower than std::to_string despite the library's advertised speed:

C++ 11 method ......... 56
String stream method .. 1171
Lexical cast method ... 78
Format method ... 210

Maybe I'm not doing this library justice (after all the reported benchmarks for fmt consider the print functionality) so I'm leaving the benchmark and report here, in case tweaks or corrections can be suggested.

This is a version of the benchmark with sehe's optimized use of stringstream (i.e. as a thread local variable)


charconv

The advertised as fastest modern method, that of using the charconv header keeps its promise and beats competition, with relative timings like:

Program returned: 0
C++ 11 method ......... 13
String stream method .. 1044
Lexical cast method ... 23
Format method ... 152
Charconv method ... 11

As shown in the following Demo, a major difference of the charconv API is that it forces the user to preallocate the result buffer (where the output is placed) even placing it in automatic memory (stack). If one goes against the grain and (unlike what's shown in the demo above) does things like creating strings on the fly, e.g. in case they need to store each "new object":

template<typename T> // D. Using c++17 to_chars() ===========
std::string StringFromNumber_CharConv(T const &value) {
    std::array<char, 10> ret;
    auto [ptr, ec] = std::to_chars(ret.data(), ret.data() + ret.size(), value);
    return  std::string(ret.data(), ptr);
}

then performance of to_chars matches that of std::to_string. So two thoughts on that:

  1. Either memory allocation is the determining factor here, and this benchmark is not fine grained enough to distinguish between to_string and to_chars.
  2. Or std::to_string has already incorporated advances made by charconv.

That said, it seems that if one needs to go as fast as possible, charconv provides the API to do so since it doesn't enforce memory allocation and the "core" convertion algorithm is apparently as fast as it gets.

Determining factor will of course be the length of numbers we are converting since different algorithms may shine for differnt input. The 4 number integers we use in the benchmark might not leave much room for charconv to show its power, something that large floating point numbers would showcase better.

Nikos Athanasiou
  • 29,616
  • 15
  • 87
  • 153
0

I added absl::StrCat as an option, made sure the code would handle floating-point as well, and then ran the benchmark loop 3 times.

I also ran them on both clang and gcc.

See https://godbolt.org/z/7feoaEEKr

For int on gcc, the results I got were:

C++ 11 method ......... 81
String stream method .. 112
Lexical cast method ... 96
Format method ......... 212
StrCat method ......... 608

C++ 11 method ......... 81
String stream method .. 131
Lexical cast method ... 143
Format method ......... 704
StrCat method ......... 118

C++ 11 method ......... 103
String stream method .. 153
Lexical cast method ... 633
Format method ......... 191
StrCat method ......... 91

I don't know what's going in with the benchmarks sometimes getting much slower results, but I suspect there's some allocator garbage collection due to all the memory allocation / deallocation.

jorgbrown
  • 1,993
  • 16
  • 23