How much performance difference when using string vs char array?

Question

I have the following code:

char fname[255] = {0}
snprintf(fname, 255, "%s_test_no.%d.txt", baseLocation, i);

vs

std::string fname = baseLocation + "_test_no." + std::to_string(i) + ".txt";

Which one performs better? Does the second one involve temporary creation? Is there any better way to do this?

How do you measure the performance of something that happens once and takes zero time? — Kerrek SB, Feb 21 '14 at 22:25
Unless you call that code several million times, you'll be hard pressed to notice the difference. Measure, yes, but notice, not so much. That said, there is a good chance that the second one will take longer because of the creation of temporary objects, but a good compiler is likely to optimize a lot of that away. — Timo Geusch, Feb 21 '14 at 22:28

AndyG · Accepted Answer · 2022-01-10T19:55:08.447

Let's run the numbers:

2022 edit:

Using Quick-Bench with GCC 10.3 and compiling with C++20 (with some minor changes for constness) demonstrates that std::string is now faster, almost 3x as much:

Original answer (2014)

The code (I used PAPI Timers)

main.cpp

#include <iostream>
#include <string>
#include <stdio.h>
#include "papi.h"
#include <vector>
#include <cmath>
#define TRIALS 10000000

class Clock
{
  public:
    typedef long_long time;
    time start;
    Clock() : start(now()){}
    void restart(){ start = now(); }
    time usec() const{ return now() - start; }
    time now() const{ return PAPI_get_real_usec(); }
};


int main()
{
  int eventSet = PAPI_NULL;
  PAPI_library_init(PAPI_VER_CURRENT);
  if(PAPI_create_eventset(&eventSet)!=PAPI_OK) 
  {
    std::cerr << "Failed to initialize PAPI event" << std::endl;
    return 1;
  }

  Clock clock;
  std::vector<long_long> usecs;

  const char* baseLocation = "baseLocation";
  //std::string baseLocation = "baseLocation";
  char fname[255] = {};
  for (int i=0;i<TRIALS;++i)
  {
    clock.restart();
    snprintf(fname, 255, "%s_test_no.%d.txt", baseLocation, i);
    //std::string fname = baseLocation + "_test_no." + std::to_string(i) + ".txt";
    usecs.push_back(clock.usec());
  }

  long_long sum = 0;
  for(auto vecIter = usecs.begin(); vecIter != usecs.end(); ++vecIter)
  {
    sum+= *vecIter;
  }

  double average = static_cast<double>(sum)/static_cast<double>(TRIALS);
  std::cout << "Average: " << average << " microseconds" << std::endl;

  //compute variance
  double variance = 0;
  for(auto vecIter = usecs.begin(); vecIter != usecs.end(); ++vecIter)
  {
    variance += (*vecIter - average) * (*vecIter - average);
  }

  variance /= static_cast<double>(TRIALS);
  std::cout << "Variance: " << variance << " microseconds" << std::endl;
  std::cout << "Std. deviation: " << sqrt(variance) << " microseconds" << std::endl;
  double CI = 1.96 * sqrt(variance)/sqrt(static_cast<double>(TRIALS));
  std::cout << "95% CI: " << average-CI << " usecs to " << average+CI << " usecs" << std::endl;  
}

Play with the comments to get one way or the other. 10 million iterations of both methods on my machine with the compile line:

g++ main.cpp -lpapi -DUSE_PAPI -std=c++0x -O3

Using char array:

Average: 0.240861 microseconds
Variance: 0.196387microseconds
Std. deviation: 0.443156 microseconds
95% CI: 0.240586 usecs to 0.241136 usecs

Using string approach:

Average: 0.365933 microseconds
Variance: 0.323581 microseconds
Std. deviation: 0.568842 microseconds
95% CI: 0.365581 usecs to 0.366286 usecs

So at least on MY machine with MY code and MY compiler settings, ~~I saw about a 50% slowdown when moving to strings.~~ that character arrays incur a 34% speedup over strings using the following formula:

((time for string) - (time for char array) ) / (time for string)

Which gives the difference in time between the approaches as a percentage on time for string alone. My original percentage was correct; I used the character array approach as a reference point instead, which shows a 52% slowdown when moving to string, but I found it misleading.

I'll take any and all comments for how I did this wrong :)

2015 Edit

Compiled with GCC 4.8.4:

string

Average: 0.338876 microseconds
Variance: 0.853823 microseconds
Std. deviation: 0.924026 microseconds
95% CI: 0.338303 usecs to 0.339449 usecs

character array

Average: 0.239083 microseconds
Variance: 0.193538 microseconds
Std. deviation: 0.439929 microseconds
95% CI: 0.238811 usecs to 0.239356 usecs

So the character array approach remains significantly faster although less so. In these tests, it was about 29% faster.

Cheers, I think I have the explanation to the behavior you observed, just take a look at my answer :-) +1 for actually performance testing this. — cmaster - reinstate monica, Feb 21 '14 at 22:51
Make that baselocation 80 character string, declare char fname[255]={} also within cycle. Then make a third test and try std::string also declared outside cycle and use it inside with append or operator+= . I trust that one will win. — Öö Tiib, Aug 05 '15 at 15:48
@ÖöTiib: I've addressed your comments in my post. I performed the timings the way I did originally because I felt it more purely captured what OP was doing. Also, I've updated the timings with the latest version of GCC that I have on Ubuntu. (not the latest overall, though. 5.2 is out as of this time of writing) — AndyG, Aug 06 '15 at 15:28
Sorry, but it does not look like what I suggested. My suggestion was to use 80 character string as a *baselocation* text itself. Other suggestion was not to use operator+ or operator=(string&&) but to use operator+= or string::append instead. The string::reserve has only point if you insert or append to the string. That can be case when you construct multimegabyte texts where such fraction of microsecond matters. Doing reserve before move assignment does not make sense. — Öö Tiib, Aug 07 '15 at 09:40
@ÖöTiib: Apologies for the delay. Thank you for the clarification. I think I see what you're suggesting. Perhaps it would be faster, however I do not suspect that the difference between the string and char array approach will change much. — AndyG, Aug 10 '15 at 16:02
Why is `std::string` faster in C++20? Did they change the implementation in GCC? — digito_evo, Jan 10 '22 at 20:04
@digito_evo: From what I can tell, it's actually the difference between using `std::to_string(i)` versus `snprintf("....%d", ..., i)`. When I add that code back in, the `string` approach is suddenly slower (or not faster) in gcc 5.5/C++11, but faster in gcc 10.3/C++11/C++20. I'll update my answer accordingly. — AndyG, Jan 10 '22 at 20:20
I ran your test 3 times on my old laptop (1st gen i5 with DDR3) with GCC 10.3 and C++20 and `std::string` was 3.4-3.8 times faster. — digito_evo, Jan 10 '22 at 21:01
@SwiftMango: I wish I could read assembly better. The optimizations performed by newer compilers seem to make all the difference. — AndyG, Jan 11 '22 at 14:06

cmaster - reinstate monica · Answer 2 · 2014-02-21T22:57:50.963

The snprintf() version will almost certainly be quite a bit faster. Why? Simply because no memory allocation takes place. The new operator is surprisingly expensive, roughly 250ns on my system - snprintf() will have finished quite a bit of work in the meantime.

That is not to say that you should use the snprintf() approach: The price you pay is safety. It is just so easy to get things wrong with the fixed buffer size you are supplying to snprintf(), and you absolutely need to supply code for the case that the buffer is not large enough. So, only think about using snprintf() when you have identified this part of code to be really performance critical.

If you have a POSIX-2008 compliant system, you may also think about trying asprintf() instead of snprintf(), it will malloc() the memory for you, giving you pretty much the same comfort as C++ strings. At least on my system, malloc() is quite a bit faster than the builtin new-operator (don't ask me why, though).

Edit:
Just saw, that you used filenames in your example. If filenames are your concern, forget about the performance of string operation! Your code will spend virtually no time in them. Unless you have on the order of 100000 such string operations per second, they are irrelevant to your performance.

Spot on. I even tried to pre-declare and allocate space in the string in my answer, but that actually just caused things to slow down. — AndyG, Feb 21 '14 at 23:04

Mats Petersson · Answer 3 · 2014-02-21T22:38:47.727

If it's REALLY important, measure the two solutions. If not, whichever you think makes most sense from what data you have, company/private coding style standards, etc. Make sure you use an optimised build [with the same optimisation you are going to use in the actual production build, not -O3 because that is the highest, if your production build is using -O1]

I expect that either will be pretty close if you only do a few. If you have several millions, there may be a difference. Which is faster? I'd guess the second [1], but it depends on who wrote the implementation of snprintf and who wrote the std::string implementation. Both certainly have the potential to take a lot longer than you would expect from a naive approach to how the function works (and possibly also run faster than you'd expect)

[1] Because I have worked with printf, and it's not a simple function, it spends a lot of time messing about with various groking of the format string. It's not very efficient (and I have looked at the ones in glibc and such too, and they are not noticeably better). On the other hand std::string functions are often inlined since they are template implementations, which improves the efficiency. The joker in the pack is whether the memory allocation for std::string that is likely to happen. Of course, if somehow baselocation turns to be rather large, you probably don't want to store it as a fixed size local array anyway, so that evens out in that case.

You seem to know more about this than I do. Care to comment on my post to see how I could make the timings more fair? — AndyG, Feb 21 '14 at 23:13
@Mats Petersson .. all true, but you might already know there are a few optimised printf's available as open-source, where esoteric formatting is sacrificed for performance. And the pink elephant in the corner is the std::allocator design ... no improvement in that department, after 6 years that I know of... don't make me reveal my sources :) — Chef Gladiator, Aug 17 '20 at 18:47

score 2 · Answer 4 · answered Jan 24 '22 at 15:15

2

I would recommend using strcat in that case. It is by far the fastest method:

answered Jan 24 '22 at 15:15

Ulrich Beckert

59
3