I have a UTF-8 string given as a null-terminated const char*
. I would like to know if the first letter of this string is an a
by itself. The following code
bool f(const char* s) {
return s[0] == 'a';
}
is wrong, as the first letter (grapheme cluster) of the string might be à
- made from 2 unicode scalar values: a
and `
. So this very simple question seems extremely difficult to answer, unless you know how grapheme clusters are made.
Still, many libraries parse UTF-8 files (YAML files, for instance) and therefore should be able to answer this kind of question. But these libraries don't seem to depend upon a Unicode library.
So my question are:
How to write code that checks if a string starts with the letter
a
?Assuming that there is no simple answer to the first question, how do parsers (such as YAML parsers) manage to parse files without being able to answer this kind of question?