Is there something like intern()
method in C or C++ like there is in Java ? If there isn't, how can I carry out string interning in C or C++?

- 156,901
- 35
- 231
- 235

- 22,386
- 64
- 200
- 328
-
2Just code exactly what you want. – David Schwartz May 17 '12 at 11:34
-
3Suhail, have you looked on these questions: http://stackoverflow.com/questions/1116040/memory-efficient-c-strings-interning-ropes-copy-on-write-etc , http://stackoverflow.com/questions/4060411/does-stdstring-use-string-interning ? – dbf May 17 '12 at 11:35
-
@David Schwartz A caching like functionality. I want string interning – Suhail Gupta May 17 '12 at 11:35
-
Sounds like you're looking for boost::flyweight< std::string >, all identical strings will use the same memory. – Ylisar May 17 '12 at 11:39
-
1Is there something like intern() method in "C/C++"? No. There is no C/C++. QED. – R. Martinho Fernandes May 17 '12 at 11:41
-
@Ylisar i think that is what is known as _string interning_ ! I don't know of any library – Suhail Gupta May 17 '12 at 11:41
-
3@R. Martinho Fernandes i asked in C **AND** C++ – Suhail Gupta May 17 '12 at 11:42
-
@SuhailGupta Well, those are two different questions. If you really care about the answer to *both*, you should make two posts. – R. Martinho Fernandes May 17 '12 at 11:44
-
Have a look at flyweight: http://www.boost.org/doc/libs/1_49_0/libs/flyweight/doc/index.html – Nick May 17 '12 at 11:50
-
@Shog9 did 'you' merge the questions ? If yes,then which answer should i accept. The one that answers the C query or C++ ? And you changed the meaning of my question. I had _and_ instead of _or_ – Suhail Gupta May 18 '12 at 07:03
-
@Subhail: you're either going to implement this in C **or** C++. So decide which, and then accept the corresponding answer. And no, I didn't close or merge this, just edited after the fact to allow answers on either language to suffice. – Shog9 May 18 '12 at 07:32
3 Answers
boost::flyweight< std::string >
seems to be exactly what you're looking for.

- 4,293
- 21
- 27
-
1
-
-
1@SuhailGupta If I knew of another way, I would add another answer. – Erick Robertson May 17 '12 at 12:28
-
If you can bare the interface `typedef std::hash_set< std::string > StringCache;` will net you a less fancy version of what you're looking for. C++ standard library is very bare bones compared to most other languages on its own. – Ylisar May 17 '12 at 12:41
-
Note that `boost::flyweight` requires the objects to be immutable; this isn't the case of `std::string`. Things like `[]` are likely to cause problems (or not, depending on how the objects are later used). – James Kanze May 17 '12 at 13:12
-
4More precisely `boost::flyweight` makes the object immutable, `[]` won't cause problems because `boost::flyweight< T >` only ever exposes `const T&`. – Ylisar May 17 '12 at 13:47
Is there something like intern() method in C like we have in Java ?
Not in the standard C library.
If there isn't, how to carry out string interning in C?
With great difficulty, I fear. The first problem is that "string" is not a well-defined thing in C. Instead you have char *
, which might point at a zero-terminated string, or might just denote a character position. Then you've got the problem that some strings are embedded in other things ... or are stored on the stack. Both of which make interning impossible and/or meaningless. Then, there is the problem that C string literals are not guaranteed to be interned ... in the way that Java guarantees it. Finally, there is the problem that interning is a storage leak waiting to happen ... if the language is not garbage collected.
Having said that, the way to (attempt to) implement interning in C would be to create a hash table to hold the interned strings. You'd need to make it a precondition that you cannot intern a string unless it is either a literal or a string allocated in its own heap node. To address the storage leak issue, you'd need a per-string reference count to detect when an interned string can be discarded.

- 698,415
- 94
- 811
- 1,216
What would string interning mean in a language which has value
semantics? Interning is a mechanism to force object identity for
references to strings with value identity. It's relevant in languages
which use reference semantics and use object identity as the default
comparison function. C++ uses value semantics by default, and types
like std::string
don't have identity, so interning makes no sense.
Some implementations (e.g. g++) may use a form of reference semantics for the string data, behind the scenes. Such an implementation could offer some sort of interning of that data, as an extension. (G++ doesn't, as far as I know, but does automatically "intern" empty strings.)
Most other implementations don't even use reference semantics internally. How would you intern an implementation using the small string optimization (like MS)? Where the data is literally in the class in some cases, and there is no dynamically allocated memory.

- 150,581
- 18
- 184
- 329