3

Given a std::string containing text encoded in an arbitrary but known character set. What is the easiest way in C++ to count the characters? It should be able to handle things like combining characters and Unicode code points.

It would be nice to have something like:

std::string test = "éäöü";
std::cout << test.size("utf-8") << std::endl;

Unfortunately, life isn't always easy with C++. :)

For Unicode, I have seen that one can use the ICU library: Cross-platform iteration of Unicode string (counting Graphemes using ICU)

But is there a more general solution?

Community
  • 1
  • 1
Pascal
  • 1,249
  • 1
  • 10
  • 21

1 Answers1

0

I'm afraid it depends on the particular encoding. If you use UTF-8 (and I really don't see why you should not), you could use UTF8-CPP.

It would appear they have a function to do just this:

::std::string test = "éäöü";
auto length = ::utf8::distance(test.begin(), test.end());
::std::cout << length << "\n"; // should print 4.
bitmask
  • 32,434
  • 14
  • 99
  • 159
  • FYI: I am currently implementing a web service framework that has a feature to validate the length of submitted text. So the encoding depends on whatever the user wants to use. Most of them will use UTF-8 but it should also work with others. – Pascal Feb 25 '15 at 13:04