68

I've recent been reading about immutable strings Why can't strings be mutable in Java and .NET? and Why .NET String is immutable? as well some stuff about why D chose immutable strings. There seem to be many advantages.

  • trivially thread safe
  • more secure
  • more memory efficient in most use cases.
  • cheap substrings (tokenizing and slicing)

Not to mention most new languages have immutable strings, D2.0, Java, C#, Python, etc.

Would C++ benefit from immutable strings?

Is it possible to implement an immutable string class in c++ (or c++0x) that would have all of these advantages?


update:

There are two attempts at immutable strings const_string and fix_str. Neither have been updated in half a decade. Are they even used? Why didn't const_string ever make it into boost?

reevesy
  • 3,452
  • 1
  • 26
  • 23
deft_code
  • 57,255
  • 29
  • 141
  • 224
  • 43
    A very elaborate and convincing argument you made there, BlueRaja. – peterchen May 26 '10 at 21:04
  • 6
    Well, BlueRaja didn't actually make an argument, as you've all so clearly pointed out. But he might be right, in that C++ is perhaps too much of a hybrid language for purist attempts at an immutable string to find a home. This says more about the C++ culture than the language itself, of course. – Steven Sudit Jun 01 '10 at 16:05
  • 5
    Objection! Ruby's string is not immutable! – Notinlist May 05 '11 at 09:42
  • 1
    They have not been updated since 2005, but [there aren’t many bugs reported](https://sourceforge.net/p/conststring/bugs/), so I think it’s fine to use. – Franklin Yu Feb 05 '18 at 15:23

11 Answers11

53

I found most people in this thread do not really understand what immutable_string is. It is not only about the constness. The really power of immutable_string is the performance (even in single thread program) and the memory usage.

Imagine that, if all strings are immutable, and all string are implemented like

class string {
    char* _head ;
    size_t _len ;
} ;

How can we implement a sub-str operation? We don't need to copy any char. All we have to do is assign the _head and the _len. Then the sub-string shares the same memory segment with the source string.

Of course we can not really implement a immutable_string only with the two data members. The real implementation might need a reference-counted(or fly-weighted) memory block. Like this

class immutable_string {
    boost::fly_weight<std::string> _s ;
    char* _head ;
    size_t _len ;
} ;

Both the memory and the performance would be better than the traditional string in most cases, especially when you know what you are doing.

Of course C++ can benefit from immutable string, and it is nice to have one. I have checked the boost::const_string and the fix_str mentioned by Cubbi. Those should be what I am talking about.

jamylak
  • 128,818
  • 30
  • 231
  • 230
yoco
  • 1,404
  • 1
  • 14
  • 19
26

As an opinion:

  • Yes, I'd quite like an immutable string library for C++.
  • No, I would not like std::string to be immutable.

Is it really worth doing (as a standard library feature)? I would say not. The use of const gives you locally immutable strings, and the basic nature of systems programming languages means that you really do need mutable strings.

  • 4
    The closest I've come to immutable strings in C++ was a "span" class that has two const pointers, one for the begin and one for the end. It did not manage memory, but did support the usual utility functions (find, etc). As a result, it turned out to be very useful for parsing. – Steven Sudit Jun 01 '10 at 16:06
  • 2
    @StevenSudit: Many large projects have that, though it's commonly called `stringref` or similar. – Mooing Duck Apr 28 '13 at 23:25
  • 3
    @MooingDuck That's true. Google calls its [StringPiece](https://code.google.com/p/re2/source/browse/re2/stringpiece.h) – Steven Sudit Apr 29 '13 at 18:10
  • 2
    There is also string_view in C++17, it appears to be const only. – pilkch Oct 12 '16 at 22:50
9

My conclusion is that C++ does not require the immutable pattern because it has const semantics.

In Java, if you have a Person class and you return the String name of the person with the getName() method, your only protection is the immutable pattern. If it would not be there you would have to clone() your strings all night and day (as you have to do with data members that are not typical value-objects, but still needs to be protected).

In C++ you have const std::string& getName() const. So you can write SomeFunction(person.getName()) where it is like void SomeFunction(const std::string& subject).

  • No copy happened
  • If anyone wants to copy he is free to do so
  • Technique applies to all data types, not just strings
Notinlist
  • 16,144
  • 10
  • 57
  • 99
  • 4
    Correction! Immutable strings can be useful in multi-threaded programs as they have zero overhad on handling concurrency. And most of the time you do not edit your strings, you rather simply replace them. – Notinlist Aug 26 '11 at 12:54
3

You're certainly not the only person who though that. In fact, there is const_string library by Maxim Yegorushkin, which seems to have been written with inclusion into boost in mind. And here's a little newer library, fix_str by Roland Pibinger. I'm not sure how tricky would full string interning at run-time be, but most of the advantages are achievable when necessary.

Cubbi
  • 46,567
  • 13
  • 103
  • 169
3

I don't think there's a definitive answer here. It's subjective—if not because personal taste then at least because of the type of code one most often deals with. (Still, a valuable question.)

Immutable strings are great when memory is cheap—this wasn't true when C++ was developed, and it isn't the case on all platforms targeted by C++. (OTOH on more limited platforms C seems much more common than C++, so that argument is weak.)

You can create an immutable string class in C++, and you can make it largely compatible with std::string—but you will still lose when comparing to a built-in string class with dedicated optimizations and language features.

std::string is the best standard string we get, so I wouldn't like to see any messing with it. I use it very rarely, though; std::string has too many drawbacks from my point of view.

Keith Pinson
  • 7,835
  • 7
  • 61
  • 104
peterchen
  • 40,917
  • 20
  • 104
  • 186
  • 3
    If `std::string` is the best standard string, but you use it *very rarely*, because of its *too many drawbacks*, what **DO** you use? – Evgeniy Berezovsky Jul 15 '12 at 22:08
  • 3
    `CString` (please don't kill me) because of >10 years of accrued libraries, better native API interop (including wchar_t / char conversions). Back then, the well-defined copy-on-write would also be an advantage over std::string's lack of performance guarantees. – peterchen Jul 16 '12 at 05:29
  • @peterchen The CString in MFC or WTL? – Mike Sep 03 '14 at 02:47
  • 1
    @Mike both the MFC and the ATL version since they are source-code-compatible, but two distinct implementaitons that don't match. It's a major WTF to always have an "ATL" and an "MFC" version of libraries. – peterchen Sep 03 '14 at 09:19
  • 1
    Immutable strings go back at least to the late 1970s if not before; I don't think memory was particularly cheap then. In Applesoft BASIC, Commodore BASIC, or many other implementations, each element in a string array would hold a two-byte pointer and a one-byte length; the string data itself would be stored in a pool with no other overhead. A statement like `A$(4)=A$(6)` would merely copy the length and the pointer; it would not have to copy any data. Microsoft's garbage-collection algorithm was not well implemented (dog slow), but it was possible for code to determine when a GC cycle... – supercat Feb 14 '15 at 23:55
  • 1
    ...would be coming soon and use some PEEKs and POKEs to add "generations" to the garbage collector, or use a third-party GC which was more efficient. Although it was not uncommon for programs to store fixed collections of strings with an overhead of one byte per string or--for ASCII strings--zero (use bit 7 of each byte to say if it's the last), I don't think one could have a mutable array of mixed-length strings with less overhead. The problem with immutable strings is that a GC has to be able to find them, which would be workable in some languages, but not so well in C++. – supercat Feb 15 '15 at 00:04
2
const std::string

There you go. A string literal is also immutable, unless you want to get into undefined behavior.

Edit: Of course that's only half the story. A const string variable isn't useful because you can't make it reference a new string. A reference to a const string would do it, except that C++ won't allow you to reassign a reference as in other languages like Python. The closest thing would be a smart pointer to a dynamically allocated string.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 2
    You'd need more than that, e.g. you'd want `std::string::replace` to return a modified copy rather than cause a compile error. – peterchen May 26 '10 at 21:07
  • 1
    @peterchen -> const std::string orig; const std::string copy = std::string(orig).replace(...); - what would an immutable string do that's better? – Edward Strange May 26 '10 at 22:50
  • IMHO assignment of a new string is mutating a string, and from what I remember of API's that had such a construct this is how they took it too. What you want really does sound more like an assignable reference and it seems to me that something like a smart pointer would be a better answer to that than making a const string that's assignable. I also do find const std::string vars useful from time to time so I'd have to beg to differ there. – Edward Strange May 26 '10 at 22:52
  • 3
    It's not the correct interface for an immutable object, it's two statements instead of one, it's an implementation detail leaking to the calling code? --- An object should make the right thing easy, the wrong thing hard (or impossible). Do I need to put a "don't show this string to other threads" comment between the copy and the replace, and afterwards a "now you can"? --- I agree that `const std::string` is a close approximation, but without some of the benefits. – peterchen May 27 '10 at 10:37
  • @Peter: It might be nice if the language supported two types of `replace`: The current one and `replaced`, where the latter operates on a const reference and returns a copy that has the replacements. The latter might be able to avoid copying everything twice. However, so long as we lack such a function, we're stuck with Noah's work-around, which is a reasonable alternative. The better answer would be full support for an immutable variant of std::string. – Steven Sudit Jun 01 '10 at 17:43
  • @CrazyEddie: Not making a copy of a source string, but using it directly (in an immutable manner). – SasQ Feb 25 '12 at 16:38
1

Immutable strings are great if, whenever it's necessary to create a new a string, the memory manager will always be able to determine determine the whereabouts of every string reference. On most platforms, language support for such ability could be provided at relatively modest cost, but on platforms without such language support built in it's much harder.

If, for example, one wanted to design a Pascal implementation on x86 that supported immutable strings, it would be necessary for the string allocator to be able to walk the stack to find all string references; the only execution-time cost of that would be requiring a consistent function-call approach [e.g. not using tail calls, and having every non-leaf function maintain a frame pointer]. Each memory area allocated with new would need to have a bit to indicate whether it contained any strings and those that do contain strings would need to have an index to a memory-layout descriptor, but those costs would be pretty slight.

If a GC wasn't table to walk the stack, then it would be necessary to have code use handles rather than pointers, and have code create string handles when local variables come into scope, and destroy the handles when they go out of scope. Much greater overhead.

supercat
  • 77,689
  • 9
  • 166
  • 211
0

Qt also uses immutable strings with copy-on-write.
There is some debate about how much performance it really buys you with decent compilers.

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
  • 4
    I would not call copy-on-write strings immutable. immutable strings are a subset of COW strings. That is, everything and immutable string can do a COW string could do as well, but the reverse is not true. It's these extra abilities that make COW strings suck for concurrent environments. – deft_code May 26 '10 at 22:01
  • And the advantage to thread safety is completely gone once you throw COW in the mix (you need to lock, either explicitly or inside the library itself) whenever you are performing a write to ensure thread safety. – David Rodríguez - dribeas May 26 '10 at 22:24
  • @David: Qt uses thread-safe COW; it does it's own locking, with atomic integers for the reference count. – CMircea May 26 '10 at 22:35
  • 1
    @Caspin - true but if you are going to have immutable strings you might as well make efficient use of them with COW – Martin Beckett May 26 '10 at 22:56
  • 1
    @iconiK: That is the reason for the comment '(... or inside the library itself)'. The thing is that locking is required, and it can be a costly operation. The fact that it is hidden from the user means that there are less chances of doing it wrong in user code, but it does not take away the costs. If you compare that with Java inmutable strings, you can copy references and know they will never be changed, you can create modifications with almost no cost at all (allocations in a generational GC are *fast* --10 cpu instructions). – David Rodríguez - dribeas May 27 '10 at 07:34
  • ... on the other end, in C++ allocations are slow, and moreover locking is also slow. If you read literature about `std::string` you will find out that the standard supports cow, and that some standard library implementations did it, but they are moving away from it as the advantage that they offered (less cost on copy unless there is a write) is smaller (in CPU time) that the cost they imply in a multithreaded environment. – David Rodríguez - dribeas May 27 '10 at 07:37
  • @Martin Beckett: *efficiency* is a term that depends on your usage pattern. COW in multithreaded environments requires locking operations, and those are costly, often more expensive than the copy itself unless there is a lot of copied strings that are not modified. – David Rodríguez - dribeas May 27 '10 at 13:10
  • 1
    Copy-on-write, as such, doesn't actually require locks; it just means that actions that appear to modify an instance actually point it to a new buffer, leaving the original alone. Replacing a pointer is almost always atomic. The hidden cost is in managing the lifespan of the original, which is usually done by reference counting. Even with interlocked operations, this counting is expensive, which is why std:string implementations have indeed moved away from it. In GC'd languages, like C#, this is a non-issue, so we have immutable strings, though without COW semantics. – Steven Sudit Jun 01 '10 at 16:12
0

constant strings make little sense with value semantics, and sharing isn't one of C++'s greatest strengths...

fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • @Steven Maybe we are talking about different things when we say "value semantics". C# strings are always handled through a transparent level of indirection (reference semantics), whereas C++ strings are not (value semantics). – fredoverflow Jun 01 '10 at 17:12
  • Maybe. In C#, actual value types (such as int) inherit from System.ValueType and are passed as copies, while reference types are passed by reference and (normally) compared by reference. While C# strings are references, they have value semantics in that they're immutable and are compared by content, not address. In C++, a std::string is a value, but it contains a reference (pointer, actually) to a mutable buffer. Therefore, passing a copy of a C++ string invokes the copy constructor to duplicate the buffer, whereas passing a const reference avoids the overhead. I hope that's clearer. – Steven Sudit Jun 01 '10 at 17:33
-1

Strings are mutable in Ruby.

$ irb
>> foo="hello"
=> "hello"
>> bar=foo
=> "hello"
>> foo << "world"
=> "helloworld"
>> print bar
helloworld=> nil
  • trivially thread safe

I would tend to forget safety arguments. If you want to be thread-safe, lock it, or don't touch it. C++ is not a convenient language, have your own conventions.

  • more secure

No. As soon as you have pointer arithmetics and unprotected access to the address space, forget about being secure. Safer against innocently bad coding, yes.

  • more memory efficient in most use cases.

Unless you implement CPU-intensive mechanisms, I don't see how.

  • cheap substrings (tokenizing and slicing)

That would be one very good point. Could be done by referring to a string with backreferences, where modifications to a string would cause a copy. Tokenizing and slicing become free, mutations become expensive.

Pierre Carrier
  • 396
  • 2
  • 9
-5

C++ strings are thread safe, all immutable objects are guaranteed to be thread safe but Java's StringBuffer is mutable like C++ string is and the both of them are thread safe. Why worry about speed, define your method or function parameters with the const keyword to tell the compiler the string will be immutable in that scope. Also if string object is immutable on demand, waiting when you absolutely need to use the string, in other words, when you append other strings to the main string, you have a list of strings until you actually need the whole string then they are joined together at that point.

immutable and mutable object operate at the same speed to my knowledge , except their methods which is a matter of pro and cons. constant primitives and variable primitives move at different speeds because at the machine level, variables are assigned to a register or a memory space which require a few binary operations, while constants are labels that don't require any of those and are thus faster (or less work is done). works only for primitives and not for object.

yomi
  • 1