33

I have a question very similar to

How do I allocate a std::string on the stack using glibc's string implementation?

but I think it's worth asking again.

I want an std::string with local storage that overflows into the free store. std::basic_string provides an allocator as a template parameter, so it seems like the thing to do is to write an allocator with local storage and use it to parameterize the basic_string, like so:

std::basic_string<
char, 
std::char_traits<char>, 
inline_allocator<char, 10> 
> 
x("test");

I tried to write the inline_allocator class that would work the way you'd expect: it reserves 10 bytes for storage, and if the basic_string needs more than 10 bytes, then it calls ::operator new(). I couldn't get it to work. In the course of executing the above line of code, my GCC 4.5 standard string library calls the copy constructor for inline_allocator 4 times. It's not clear to me that there's a sensible way to write the copy constructor for inline_allocator.

In the other StackOverflow thread, Eric Melski provided this link to a class in Chromium:

http://src.chromium.org/svn/trunk/src/base/stack_container.h

which is interesting, but it's not a drop-in replacement for std::string, because it wraps the std::basic_string in a container so that you have to call an overloaded operator->() to get at the std::basic_string.

I can't find any other solutions to this problem. Could it be that there is no good solution? And if that's true, then are the std::basic_string and std::allocator concepts badly flawed? I mean, it seems like this should be a very basic and simple use case for std::basic_string and std::allocator. I suppose the std::allocator concept is designed primarily for pools, but I think it ought to cover this as well.

It seems like the rvalue-reference move semantics in C++0x might make it possible to write inline_allocator, if the string library is re-written so that basic_string uses the move constructor of its allocator instead of the copy constructor. Does anyone know what the prospect is for that outcome?

My application needs to construct a million tiny ASCII strings per second, so I ended up writing my own fixed-length string class based on Boost.Array, which works fine, but this is still bothering me.

Community
  • 1
  • 1
James Brock
  • 3,236
  • 1
  • 28
  • 33
  • 4
    As someone who's wrestled with custom allocators in the past, I'm very interested in hearing an authority speak on this topic. – Don Neufeld Mar 30 '11 at 21:48
  • +1 for "free store". Oh, and because it's a good question. – Lightness Races in Orbit Mar 30 '11 at 22:51
  • 1
    The link to [Generic: A Policy-Based basic_string Implementation](http://drdobbs.com/184403784) is terrific discussion of basic_string storage. I still wonder, though, if it is possible to write the `inline_allocator` class in my question. A brief glance at Alexandresu's article doesn't tell me the answer, I'll have to spend some time with it. – James Brock Mar 30 '11 at 23:51
  • The link to Chromium's stack_container.h is dead. Here's an updated one: http://src.chromium.org/viewvc/chrome/trunk/src/base/containers/stack_container.h?view=markup – Arto Bendiken Jul 31 '13 at 23:56

4 Answers4

16

Andrei Alexandrescu, C++ programmer extraordinaire who wrote "Modern C++ Design" once wrote a great article about building different string implementations with customizable storage systems. His article (linked here) describes how you can do what you've described above as a special case of a much more general system that can handle all sorts of clever memory allocation requirements. This doesn't talk so much about std::string and focuses more on a completely customized string class, but you might want to look into it as there are some real gems in the implementation.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • That's a great article by Alexandrescu, thanks. This is a very good answer to the question of how to customize storage for basic_string. – James Brock Mar 30 '11 at 23:35
10

C++2011 is really going to help you here :)

The fact is that the allocator concept in C++03 was crippled. One of the requirement was that an allocator of type A should be able to deallocate memory from any other allocator from type A... Unfortunately this requirement is also at odds with stateful allocators each hooked to its own pool.

Howard Hinnant (who manages the STL subgroup of the C++ commitee and is implementing a new STL from scratch for C++0x) has explored stack-based allocators on his website, which you could get inspiration from.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 1
    That's very interesting. I looked at Hinnant's `stack_alloc` class and it works splendidly for `std::vector` but alas, not for `std::basic_string`. The problem is again, I believe, in the way `std::basic_string` uses the `stack_alloc` copy constructor. – James Brock Mar 31 '11 at 13:22
  • And I think it is permitted to use the copy constructor that way because of the requirement you described. – James Brock Mar 31 '11 at 13:29
  • @James: yes, unfortunately since the standard all but imply that the allocators are stateless (or at the very least that all allocators of the same type more or less share the same state, somehow), copies are perfectly allowed. – Matthieu M. Mar 31 '11 at 13:35
6

This is generally unnecessary. It's called the "short string optimization", and most implementations of std::string already include it. It may be hard to find, but it's usually there anyway.

Just for example, here's the relevant piece of sso_string_base.h that's part of MinGW:

  enum { _S_local_capacity = 15 };

  union
  {
_CharT           _M_local_data[_S_local_capacity + 1];
size_type        _M_allocated_capacity;
  };

The _M_local_data member is the relevant one -- space for it to store (up to) 15 characters (plus a NUL terminator) without allocating any space on the heap.

If memory serves, the Dinkumware library included with VC++ allocates space for 20 characters, though it's been a while since I looked, so I can't swear to that (and tracking down much of anything in their headers tends to be a pain, so I prefer to avoid looking if I can).

In any case, I'd give good odds that you've been engaged in that all-too-popular pass-time known as premature optimization.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 1
    VC++ allocates 16 bytes for the SSO, which in my opinion is too small to be generally useful as `std::wstring` is more commonly used in Windows code than `std::string` and that only allows for 8 characters in a `std::wstring` (or 7 if `.c_str()` is used frequently). – ildjarn Mar 30 '11 at 23:04
  • Doesn't the header belong to MinGW library extensions (not part of std::string)? – UncleBens Mar 30 '11 at 23:14
  • @UncleBens: I didn't think so, but I could be mistaken. There are also alternatives (e.g., STLPort, libc++) that definitely do use SSO in their implementations of `std::string`. – Jerry Coffin Mar 30 '11 at 23:29
  • @ildjarn: Looking again, you're quite right, and I agree that is a bit on the small size. I still think it would be easier to modify it and re-compile the standard library than try to hack it on afterwards though. – Jerry Coffin Mar 30 '11 at 23:35
  • Ah, thanks for the term "short string optimization." I need to be able to control the threshold length of the short string optimization. And I'm not kidding about the million strings constructions per second. – James Brock Mar 30 '11 at 23:40
  • 1
    @James Brock: Surely -- and I never doubted it. As for controlling the threshold, in most cases it's a matter of finding the `enum` in the header, modifying it as needed, and (possibly) re-compiling the standard library to match (but since it's a template, there's a good chance you only need to modify the header). – Jerry Coffin Mar 30 '11 at 23:44
  • @Jerry Coffin : Speaking for VC++, pre-VC++ 2010, `std::basic_string` and `std::basic_string` are pre-instantiated and exported from the CRT rather than being header-only, making changing the size of the SSO quite painful. For VC++ 2010, `std::basic_string<>` is now purely header-only, making it trivial, as you say. – ildjarn Mar 31 '11 at 00:08
2

I believe the code from Chromium just wraps things into a nice shell. But you can get the same effect without using the Chromium wrapper container.

Because the allocator object gets copied so often, it needs to hold a reference or pointer to the memory. So what you'd need to do is create the storage buffer, create the allocator object, then call the std::string constructor with the allocator.

It will be a lot wordier than using the wrapper class but should get the same effect.

You can see an example of the verbose method (still using the chromium stuff) in my question about stack vectors.

Community
  • 1
  • 1
Zan Lynx
  • 53,022
  • 10
  • 79
  • 131