8

Unlike std::vector, std::string does not provide a unary constructor that takes a size:

std::string s(size); // ERROR

Is there any difference between:

std::string s(size, '\0');

and

std::string s;
s.resize(size);

in terms of their performance on common implementations?

Will resize initialize the string to all zero characters or will it leave them an unspecified value?

If all zero, is there any way to construct a string of a given size, but leave the characters with an unspecified value?

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
  • What is `size`, is it `size = 0`? – Khalil Khalaf Jul 29 '16 at 20:42
  • 1
    @FirstStep: No, where `size` is some `size_t` that is `> 0`. – Andrew Tomazos Jul 29 '16 at 20:44
  • Well did you run a few implementations through a profiler to gauge their performance? – Captain Obvlious Jul 29 '16 at 20:45
  • @CaptainObvlious: No, I did not. – Andrew Tomazos Jul 29 '16 at 20:46
  • `Will resize initialize the string to all zero characters or will it leave them an unspecified value?` - see documentation, e.g., [here](http://en.cppreference.com/w/cpp/string/basic_string/resize). – davidbak Jul 29 '16 at 20:48
  • According to the standard, `resize(n)`, where `n > size()` will initialize the remainder of the string with characters as if they were initialized via `charT()` – Alejandro Jul 29 '16 at 20:49
  • @AndrewTomazos You can specify an allocator like [this one](http://stackoverflow.com/a/21028912/241631) if you don't want value initialization to occur. But of course, then the type isn't `std::string` anymore ... – Praetorian Jul 29 '16 at 20:55
  • There shouldn't be any obvious performance difference between these two, generally both should be up to linear in final string's length. – Chuanzhen Wu Jul 29 '16 at 20:57
  • My initial concern was that the constructor would do the buffer allocation, zero-initialize the buffer, and then re-initialize it with `\0` (second ctor argument), an unnecessary operation. But I looked at libstdc++'s implementation of the constructor taking a size and initializing value ( for the non-SSO case ). It allocates a buffer large enough to satisfy the size request and zero-intializes it. Then it performs an if conditional on the initializing value, and only if its "true"-ish, will it then copy that value into the buffer. So the constructor appears to do the optimal thing =) – Alejandro Jul 29 '16 at 21:02

4 Answers4

4

There is a difference, as in std::string s(size, '\0');, all of the memory needed for the string can be allocated at once. However, with the second example, if size is greater than the amount of characters stored for small string optimization, an extra allocation may have to be performed, although this is implementation defined, and will definitely not be more performant in that regard in a standard-compliant C++ 17 implementation. However, the first example is more consise, and may be more performant, so it is probably preferable. When calling s.resize(size);, all new characters will be initialized with char's default constructor, aka '\0'. There is no way to initialize a string with unspecified values.

DeepCoder
  • 830
  • 6
  • 10
  • 6
    Why would there be a second allocation in the second case? In both cases, if `size` is greater than the SSO buffer, there'll be one allocation. – Praetorian Jul 29 '16 at 20:51
  • For the same reason that `make_unique()` is faster then a `new` call and calling the constructor. In the first case, all the memory `std::string` needs can be allocated together. However, in the second case, `new` may have to be called twice. That is implementation defined however, which I will clarify. – DeepCoder Jul 29 '16 at 20:56
  • 6
    *For the same reason that `make_unique()` is faster then a `new` call and calling the constructor* ... are you thinking of `make_shared`, because there should be no difference with `make_unique`. Anyway, I see that you're assuming default constructing the `std::string` will allocate memory once, which would be very unusual, but allowed. But still, asserting that there is a difference seems wrong. There *may* be a difference, but most likely not, unless you consider initializing a few variables to `0` and then changing them, vs initializing them to `size` to begin with, a difference. – Praetorian Jul 29 '16 at 21:00
  • 4
    `string()` is now `noexcept`, meaning it is forbidden from allocating memory (thankfully). – Howard Hinnant Jul 29 '16 at 21:08
  • Well it is `noexcept` since C++ 17. But I will update my answer once more, as I did not know that before and it will improve the answer. Thanks for spotting that! – DeepCoder Jul 29 '16 at 21:12
  • @HowardHinnant Doesn't that just mean my *libc++fromhell* implementation of `std::string` must swallow `bad_alloc` in the default constructor if my attempt at allocation fails? :) – Praetorian Jul 29 '16 at 21:55
  • @Praetorian: I think I understand all the words individually in your comment. But I have no clue what they mean as a sentence. If it is just humor, no problem. – Howard Hinnant Jul 29 '16 at 22:30
  • @HowardHinnant It was half attempted humor and half curiosity whether a `noexcept` specification actually prevents you from allocating memory for some reason, so I'll explain. If I were implementing `basic_string() noexcept`, couldn't I try to allocate memory anyway and catch `bad_alloc` if the allocation fails? – Praetorian Jul 29 '16 at 22:51
  • @Praetorian: Yes. And then what? If you didn't _need_ the memory to achieve the default state, why allocate it? The default string constructor could also establish network connections. But that would of course also be silly. The `noexcept` on `string` (and `vector`) is placed to send a strong message that this operation won't allocate memory. And we have *not* decorated the list default constructor for the same reason -- it is allowed to allocate a sentinel node. – Howard Hinnant Jul 29 '16 at 23:04
3

The actual answer would be implementation-based, but I'm fairly sure that std::string s(size, '\0'); is faster.

std::string s;
s.resize(size);

According to the documentation for std::string.

1) Default constructor. Constructs empty string (zero size and unspecified capacity).

The default constructor will create a string with an "unspecified capacity". My sense here is that the implementation is free to determine a default capacity, probably in the realm of 10-15 characters (totally speculation).

Then in the next line, you will reallocate the memory (resize) with the new size if the size is greater than the current capacity. This is probably not what you want!

If you really want to find out definitively, you can run a profiler on the two methods.

Colin Basnett
  • 4,052
  • 2
  • 30
  • 49
  • While its true it's implementation based, I'm not aware of any implementation where the default constructor allocates memory. Either it uses SSO and has a small capacity in a stack buffer (no allocation), or it doesn't use SSO and has a zero capacity. C++17 now enforces this by making the default constructor noexcept. So in reality, in almost all implementations, there's no performance difference. – ScottG Feb 23 '21 at 18:12
2

There is already a good answer from DeepCoder.

For the records however, I'd like to point out that strings (as for vectors) there are two distinct notions:

  • the size(): it's the number of actual (i.e. meaningful) characters in the string. You can change it using resize() (to which you can provide a second parameter to say what char you want to use as filler if it should be other than '\0')
  • the capacity(): it's the number of characters allocated to the string. Its at least the size but can be more. You can increase it with reserve()

If you're worried about allocation performance, I believe it's better to play with the capacity. The size should really be kept for real chars in the string not for padding chars.

By the way, more generally, s.resize(n) is the same as s.resize(n, char()). So if you'd like to fill it on the same way at construction, you could consider string s(n, char()). But as long as you don't use basic_string<T> for T being different from characters, your '\0' just does the trick.

Christophe
  • 68,716
  • 7
  • 72
  • 138
1

Resize does not leave elements uninitialized. According to the documentation: http://en.cppreference.com/w/cpp/string/basic_string/resize

s.resize(size) will value-initialize each appended character. That will cause each element of the resized string to be initialized to '\0'.

You would have to measure the performance difference of your specific C++ implementation to really decide if there's a worthwhile difference or not.

After looking at the machine generated by Visual C++ for an optimized build, I can tell you the amount of code for either version is similar. What seems counter intuitive is that the resize() version measured faster for me. Still, you should check your own compiler and standard library.

ChrisG0x20
  • 281
  • 1
  • 6