7

Before I begin, I need to state that my application uses lots of strings, which are on average quite small, and which do not change once created.

In Visual Studio 2010, I noticed that the capacity of std::string is at least 30. Even if I write std::string str = "test";, the capacity of str is 30. The function str.shrink_to_fit() does nothing about this although a function with the same name exists for std::vector and works as expected, namely decreasing the capacity so that capacity == size.

  1. Why does std::string::shrink_to_fit() not work at expected?
  2. How can I ensure that the string allocates the least amount of memory?
πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
Fabian
  • 4,001
  • 4
  • 28
  • 59
  • Related: http://stackoverflow.com/questions/2916358/immutable-strings-vs-stdstring – JimmyB Jan 06 '16 at 13:20
  • You could use a custom allocator for `std::basic_string`. However, doing so will create a string type that is not compatible with `std::string`. – kfx Jan 06 '16 at 13:24
  • This sounds like the small buffer optimization with a rather larger small buffer. If it's the small buffer optimization then you can't make it smaller. – Cheers and hth. - Alf Jan 06 '16 at 13:24
  • This is probably caused by "short [or small] string optimisation" and unfortunately for you, it's an optimisation for speed rather than space. I doubt that it can be disabled, but I wouldn't mind being wrong. – molbdnilo Jan 06 '16 at 13:25
  • Are the values for your strings compile or runtime constant? – Superlokkus Jan 06 '16 at 13:42
  • 1
    Note that on a 64-bit system, you can't realistically implement `std::basic_string` without at least 24 bytes (3 pointers) in the string object itself. It's quite possible this is SSO, and the only space you're wasting is 8 bytes on stack (which could even be wasted anyway for alignment). – Angew is no longer proud of SO Jan 06 '16 at 13:52
  • 1
    @molbdnilo If implemented correctly, SSO is optimization for **both** space and time. Good optimization uses for small buffer the same space that otherwise would be used for three pointers. – Ilya Popov Jan 06 '16 at 15:05
  • 1
    Here is a summary of the SSO characteristics of various string implementations: http://stackoverflow.com/a/28003328/576911 – Howard Hinnant Jan 06 '16 at 17:06
  • @IlyaPopov So I *was* wrong. That's good. – molbdnilo Jan 06 '16 at 17:32

4 Answers4

8
  1. Your std::string implementation most likely uses some form of the short string optimization resulting in a fixed size for smaller strings and no effect for shrink_to_fit. Note that shrink_to_fit is non-binding for the implementation, so this is actually conforming.
  2. You could use a vector<char> to get more precise memory management, but would loose some of the additional functionality of std::string. You could also write your own string wrapper which uses a vector internally.
Community
  • 1
  • 1
pmr
  • 58,701
  • 10
  • 113
  • 156
1

One reason that std::string::shrink_to_fit() does nothing is that it is not required to by the standard

Remarks: shrink_to_fit is a non-binding request to reduce capacity() to size(). [ Note: The request is non-binding to allow latitude for implementation-specific optimizations. —end note ]

If you want to make sure the string shrinks then you can use the swap() trick like

std::string(string_to_shrink).swap(string_to_shrink)

Another reason this may not work is that the implementer of std::string is allowed to implement short string optimization so you could always have a minimum size of 30 on your implementation.

Community
  • 1
  • 1
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • The swap trick is essentially what `shrink_to_fit` was intended to replace. I think the issue here is the SSO. – pmr Jan 06 '16 at 13:27
1

What you observe is a result of SSO (short string optimization), as pointed out by others.

What you could do about it depends on the usage pattern:

  • If you strings are parts of one big string, which is typical for parsing, you can use classes like std::experimental::string_view, GSL string_span, Google's StringPiece, LLVM's StringRef etc. which do not store data themselves but only refer to a piece of some other string, while providing interface similar to std::string.

  • If there are multiple copies of the same strings (especially long ones), it may make sense to use CoW (copy-on-write) strings, where copies share the same buffer using reference counter mechanism until modified. (But be aware of downsides)

  • If the strings are very short (just a few chars) it may make sense to write your own specialized class, something in line with Handling short codes by Andrzej

Whatever case you chose, it is important to establish good benchmarking procedure to clearly see what effect (if any) you get.

Upd: after rereading the introduction to the question, I think the third approach is the best for you.

Ilya Popov
  • 3,765
  • 1
  • 17
  • 30
0

If you are using a lot of small strings in your application then you might want to take a look at fbstring (https://github.com/facebook/folly/blob/master/folly/docs/FBString.md).

hungptit
  • 1,414
  • 15
  • 16