4

Ideally, an immutable string class would only need one memory allocation for each string. Even the reference count could be stored in the same chunk of memory that holds the string itself.

A trivial implementation of string and shared_ptr would allocate three distinct pieces of memory for shared_ptr<string const>:

  • Memory for the string buffer
  • Memory for the string object
  • Memory for the reference count

Now, I know that when using std::make_shared(), it is possible for a smart implementation to combine the last two into a single allocation. But that would still leave two allocations.

When you know that the string is immutable, the string buffer won't be reallocated, hence it should be possible to integrate it with the string object, leaving only one allocation.

I know that some string implementations already use such optimizations for short strings, but I'm after an implementation that does this regardless of string length.

My questions are: Is my reasoning sound? Is an implementation actually permitted and able to do this? Can I reasonably expect from a good quality standard library to implement this optimization? Do you know of contemporary library implementations which do this?

Or is this something I would have to implement myself?

sh-
  • 941
  • 6
  • 13
  • GCC 4.x had reference counted `std::string`: https://stackoverflow.com/questions/12520192/is-stdstring-refcounted-in-gcc-4-x-c11 . Later versions of GCC still have it if you compile with `-D_GLIBCXX_USE_CXX11_ABI=0`. – John Zwinck Jun 29 '17 at 11:32
  • have a look at [allocate_shared](http://en.cppreference.com/w/cpp/memory/shared_ptr/allocate_shared) – Caleth Jun 29 '17 at 11:35
  • @Caleth That won't help, it works much like `make_shared` except it uses an explicit allocator. – Some programmer dude Jun 29 '17 at 11:39
  • 3
    There is a reference-counted immutable string in the standard library. It's spelled `std::runtime_error`. – T.C. Jun 29 '17 at 11:43
  • @T.C. That made me almost LOL... :) But the string is still *separatley allocated*. So still two allocations: One for the object and one for the string. – Some programmer dude Jun 29 '17 at 11:48
  • Perhaps it's time to send a proposal for the C++ standards committee about a `std::immutable_shared_ptr` class with a `std::make_immutable_shared_ptr` function that can allocate all data as a single block of memory? And make a Boost implementation as a proof-of-concept? – Some programmer dude Jun 29 '17 at 12:03
  • @Someprogrammerdude I *think* that my custom allocator below is roughly correct – Caleth Jun 29 '17 at 12:07
  • @Someprogrammerdude No, `runtime_error` handles the reference counting for you, so it stands in for the `shared_ptr` :) – T.C. Jun 29 '17 at 12:39
  • @T.C. Semantically immutable objects can support built-in reference counting, via a hidden `mutable` member, whose existence is not exposed to applications. – Kaz Jun 29 '17 at 12:45
  • @Kaz What does that have to do with anything I wrote? – T.C. Jun 29 '17 at 14:21

2 Answers2

1

I believe that the only way to do this is a make_shared which accepts arrays of run-time variable size. The standard one does not, even as of c++17 (which adds support for shared_ptr to arrays).

Boost, on the other hand, has boost::make_shared, which can take an array size parameter as well. Once you have that, you're golden; you get a shared_ptr<char[]> which does pretty much what you want (besides actually being a std::string.

If you don't want to use boost you could just roll your own. It probably wouldn't be that hard.

Something else to consider is that if you will only ever create O(1) strings, it will be much faster to just never delete them, and pass around raw pointers (or std::string_views) instead. This avoids any copying, or fiddling with reference counts. (The reference counts are actually quite slow, since they use atomic operations.)

You could also use an interning mechanism like std::unordered_set<std::string>.

Dan
  • 12,409
  • 3
  • 50
  • 87
  • If using C++17, you could wrap your `boost::shared_ptr` in a simple class which gets you easy access to a `std::string_view`. – aschepler Jun 29 '17 at 12:09
  • Thanks for pointing this out. Not having the interface of a std::string is a definite drawback, however. I thought about using std::string_view, but how do I actually get at the size of the array? – sh- Jun 29 '17 at 12:14
  • @sh- Hm, it looks like you actually can't. I guess you would have to use null-terminated strings if you did it this way. – Dan Jun 29 '17 at 12:19
  • Or store the length manually in the first few bytes of the string. At that point I might as well implement the entire thing myself. – sh- Jun 29 '17 at 12:23
  • @Dan: I do want to eventually delete the strings, once they're not used anymore. They are typically not short (often around 1k chars) and it isn't easy to devise a single owner. However, they do not change, once constructed. So shared ownership is the first thing that comes to mind. – sh- Jun 29 '17 at 16:12
0

You'd probably need to use a custom allocator for all of the allocation.

class ImmutableStringAllocator;

template<typename CharT>
using immutable_string = std::basic_string<CharT, std::char_traits<CharT>, ImmutableStringAllocator>

template<size_t N>
immutable_string<char> make_immutable_string(char (&data)[N])
{
    ImmutableStringAllocator alloc(N);
    // going for basic_string::basic_string(charT *, size_t, Allocator)
    return allocate_shared<immutable_string<char>>(alloc, data, N, alloc);
}

class ImmutableStringAllocator {
    size_t len;
    size_t offset;
    char * buf;
    std::reference_wrapper<char *> ref;
public:
    // Normal Allocator stuff here
    ImmutableStringAllocator(size_t N) : len(N), buf(nullptr), offset(0), ref(buf) {}

    ImmutableStringAllocator(const ImmutableStringAllocator & other) : len(other.len), buf(nullptr), offset(other.offset), ref(other.buf) {}

    ImmutableStringAllocator operator=(const ImmutableStringAllocator & other) 
    { 
         assert(buf == nullptr); 
         temp(other); 
         swap(*this, temp); 
         return *this; 
    }

    pointer allocate(size_type n, const_void_pointer hint)
    {
         if (!ref.get()) { buf = ::new(n + len); offset = n; return buf; }
         return ref.get() + offset;
    }
}
Caleth
  • 52,200
  • 2
  • 44
  • 75
  • 1
    Isn't the allocator used by the shared_ptr to allocate the string object different than the one used by the string to allocate its storage? – Dan Jun 29 '17 at 11:55
  • According to cppreference, `allocate_shared` uses a copy of the allocator. – Dan Jun 29 '17 at 12:22