Since std::string
is actually a typedef
of a templated class, how can I override it? I want to make a UTF-8 std::string
that will return the correct length, among other things.
8 Answers
If you must define your own string type, then don't inherit from std::string
but define your own Character Traits class and do something like
typedef std::basic_string<unsigned char, utf8_traits> utf8string;
See also Herb Sutter's website.

- 355,277
- 75
- 744
- 836
-
+1 - there is a reason `std::string` is a `typedef`, and this is it. – Steve Townsend Nov 17 '10 at 14:23
-
On the other hand, I would not use a `basic_string` to handle UTF-8 sequences. You'll more harm than good whenever you try to manipulate it. – Matthieu M. Nov 17 '10 at 14:56
-
3@Matthieu M., that depends entirely on how you implement the Character Traits. I can see two options: either pack them as UTF-8 directly and implement the `state_type`, or pack them as UTF-32 and convert at the boundary. Either way, it's a lot of work, but it retains compatibility with STL algorithms. – Fred Foo Nov 17 '10 at 15:49
DON'T DERIVE FROM STRING
std::string, that is, basically the whole basic_string template is not designed to be derived from. There are zillions of articles about that already. It doesn't have any virtual functions so there is nothing to override. The best you can do is hide something. Best is to use composition/aggregation! That is, just keep a member of type string in your class and forward the calls! Again, just to make sure
DON'T DERIVE FROM STRING

- 287,565
- 48
- 449
- 722

- 130,161
- 59
- 324
- 434
-
2-1. Best practice for this task is to provide a Character Traits object and instantiate `std::basic_string`. Hiding an `std::string` in this case raises the issue of `char` signedness, since UTF-8 is an 8-bit multibyte encoding. – Fred Foo Nov 17 '10 at 14:16
-
1@larsmans: No objections :) I was just too preoccupied with the fact that someone somewhere had the thought to derive from a nonpolymorphic type :) – Armen Tsirunyan Nov 17 '10 at 14:44
-
-1. Doesn't really answer the question. larsman's answer at least gives a good idea for a direction to go. – leetNightshade Jul 10 '12 at 16:33
-
1-1. I don't find this argument convincing. It presumes that the only classes which has virtual function can be derived from. If the `basic_string` doesn't have virtual function, then the **guideline** should be : DONT USE IT POLYMORPHICALLY. There is nothing wrong in inheritance, if you have documented it properly! – Nawaz Jan 29 '13 at 13:58
It is generally considered a mistake in C++ to derive from a standard library container. However, the functionality you are looking for has already been implemented. Have a look at Glib::ustring.
Hope this helps!

- 76,540
- 58
- 260
- 305

- 9,930
- 1
- 27
- 35
Have you looked at ICU?
A typedef is just a convenient label.
class foo : public bar {} ;
works just fine when bar is a typedef of a PT.
It may not be a good idea in this case, but the language supports it.

- 97,814
- 39
- 186
- 310
Just be sure you know what you are doing first. What is exactly the "correct length" you want to return from your string objects? Number of code points? That does not always correspond to the number of characters as perceived by the user.
Anyway, take a look at the utf8-cpp library to see an alternative approach to deriving from std::string.

- 24,346
- 3
- 50
- 88
Better idea: create an STL-compatible utf8_string container without inheriting from std::string.
Writing a unicode implementation that conforms and works properly in every circumstance is very difficult to do. I would advise you to use an existing library or implementation instead of rolling your own. For example, Windows, OSX and Qt all have libraries which support UTF-16 and other encoded strings.

- 453
- 3
- 8
As is has already been stated by others : don't derive from std::string
, it's just not designed for this.
You should have a look on this article, which shows how to create a case insensitive string class as an example. You will see that the logic implemented in std::basic_string
is independent of the character type, and that providing some custom char_traits
should do the trick.

- 74,451
- 13
- 99
- 111
-
1Actually I would not recommend doing this. I did this a few years ago and I had to regret that decision. What will happen is that you are going to have to convert back and forth between this new type and the standard string type, all over your code base. It's not pretty. A great article that explains in detail is found here: http://lafstern.org/matt/col2_new.pdf. Summary: Case insensitivity isn't about an object, it's about how you use an object. – Daniel Lidström Nov 17 '10 at 14:12