Why is snprintf consistently 2x faster than ostringstream for printing a single number?

Question

I was testing various approaches at formatting doubles in C++, and here's some code I came up with:

#include <chrono>
#include <cstdio>
#include <random>
#include <vector>
#include <sstream>
#include <iostream>

inline long double currentTime()
{
    const auto now = std::chrono::steady_clock::now().time_since_epoch();
    return std::chrono::duration<long double>(now).count();
}

int main()
{
    std::mt19937 mt(std::random_device{}());
    std::normal_distribution<long double> dist(0, 1e280);
    static const auto rng=[&](){return dist(mt);};
    std::vector<double> numbers;
    for(int i=0;i<10000;++i)
        numbers.emplace_back(rng());

    const int precMax=200;
    const int precStep=10;

    char buf[10000];
    std::cout << "snprintf\n";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        const auto t0=currentTime();
        for(const auto num : numbers)
            std::snprintf(buf, sizeof buf, "%.*e", precision, num);
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s\n";
    }

    std::cout << "ostringstream\n";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        std::ostringstream ss;
        ss.precision(precision);
        ss << std::scientific;
        const auto t0=currentTime();
        for(const auto num : numbers)
        {
            ss.str("");
            ss << num;
        }
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s\n";
    }
}

What makes me wonder is that at first, when precision is less than 40, I get more or less the same performance. But then the difference goes to 2.1x in favor of snprintf. See my output on Core i7-4765T, Linux 32-bit, g++ 5.5.0, libc 2.14.1, compiled with -march=native -O3:

snprintf
Precision 10: 0.0262963 s
Precision 20: 0.035437 s
Precision 30: 0.0468597 s
Precision 40: 0.0584917 s
Precision 50: 0.0699653 s
Precision 60: 0.081446 s
Precision 70: 0.0925062 s
Precision 80: 0.104068 s
Precision 90: 0.115419 s
Precision 100: 0.128886 s
Precision 110: 0.138073 s
Precision 120: 0.149591 s
Precision 130: 0.161005 s
Precision 140: 0.17254 s
Precision 150: 0.184622 s
Precision 160: 0.195268 s
Precision 170: 0.206673 s
Precision 180: 0.218756 s
Precision 190: 0.230428 s
Precision 200: 0.241654 s
ostringstream
Precision 10: 0.0269695 s
Precision 20: 0.0383902 s
Precision 30: 0.0497328 s
Precision 40: 0.12028 s
Precision 50: 0.143746 s
Precision 60: 0.167633 s
Precision 70: 0.190878 s
Precision 80: 0.214735 s
Precision 90: 0.238105 s
Precision 100: 0.261641 s
Precision 110: 0.285149 s
Precision 120: 0.309025 s
Precision 130: 0.332283 s
Precision 140: 0.355797 s
Precision 150: 0.379415 s
Precision 160: 0.403452 s
Precision 170: 0.427337 s
Precision 180: 0.450668 s
Precision 190: 0.474012 s
Precision 200: 0.498061 s

So my main question is: what is the reason for this twofold difference? And additionally, how can I make ostringstream's performance closer to that of snprintf?

NOTE: another question, Why is snprintf faster than ostringstream or is it?, is different from mine. First, there's no specific answer there, why formatting of a single number in different precisions is slower. Second, that question asks "why it's slower in general", which is too broad to be useful to answer my question, while this one asks about one specific scenario of formatting single double number.

Possible duplicate of [Why is snprintf faster than ostringstream or is it?](https://stackoverflow.com/questions/445315/why-is-snprintf-faster-than-ostringstream-or-is-it) — Flopp, Feb 27 '18 at 06:19
@Flopp it's not: first, there's no specific answer why formatting of a single number in different precisions is slower. Second, it asks "why it's slower in general", which is too handwavy to make any sense, while my question asks about one specific scenario. — Ruslan, Feb 27 '18 at 06:20
I suspect you're building a DEBUG build. When I build Release with Visual Studio, the perf numbers are only marginally different between snprintf and ostringstream. — selbie, Feb 27 '18 at 06:21
@selbie see my remark about compilation options: `-march=native -O3`. It's definitely not debug mode. — Ruslan, Feb 27 '18 at 06:21
@Ruslan: Then maybe GCC's `stringstream` implementation is crap. Or Visual Studio's `snprintf` implementation is crap. — Nicol Bolas, Feb 27 '18 at 06:39
C++ streams have been famous for being slow [Does the C++ standard mandate poor performance for iostreams, or am I just dealing with a poor implementation?](https://stackoverflow.com/q/4340396/995714) [printf more than 5 times faster than std::cout?](https://stackoverflow.com/q/12044357/995714), [C++ iostream vs. C stdio performance/overhead](https://stackoverflow.com/q/37894262/995714) — phuclv, Feb 27 '18 at 06:53

Ruslan · Accepted Answer · 2018-03-02T09:56:46.890

std::ostringstream calls vsnprintf twice: first time to try with a small buffer, and the second one with the correctly-sized buffer. See locale_facets.tcc around line 1011 (here std::__convert_from_v is a proxy for vsnprintf):

#if _GLIBCXX_USE_C99_STDIO
    // Precision is always used except for hexfloat format.
    const bool __use_prec =
      (__io.flags() & ios_base::floatfield) != ios_base::floatfield;

    // First try a buffer perhaps big enough (most probably sufficient
    // for non-ios_base::fixed outputs)
    int __cs_size = __max_digits * 3;
    char* __cs = static_cast<char*>(__builtin_alloca(__cs_size));
    if (__use_prec)
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __prec, __v);
    else
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __v);

    // If the buffer was not large enough, try again with the correct size.
    if (__len >= __cs_size)
      {
        __cs_size = __len + 1;
        __cs = static_cast<char*>(__builtin_alloca(__cs_size));
        if (__use_prec)
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __prec, __v);
        else
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __v);
      }

This exactly matches the observation that for small requested precision performance is the same as that of snprintf, while for larger precision it's 2x poorer.

Moreover, since the buffer used doesn't depend on any properties of std::ostringstream buffer, only on __max_digits, which is defined as __gnu_cxx::__numeric_traits<_ValueT>::__digits10, there doesn't seem to be any natural fix for this other than fixing libstdc++ itself.

I've reported it as bug to libstdc++.

Why is snprintf consistently 2x faster than ostringstream for printing a single number?

1 Answers1

Linked

Related