I have written a parser that turns out works incorrectly with UTF-8 texts.
The parser is very very simple:
while(pos < end) {
// find some ASCII char
if (text.at(pos) == '@') {
// Check some conditions and if the syntax is wrong...
if (...)
createDiagnostic(pos);
}
pos++;
}
So you can see I am creating a diagnostic at pos
. But that pos is wrong if there were some UTF-8 characters (because UTF-8 characters in reality consists of more than one char
. How do I correctly skip the UTF-8 chars as if they are one character?
I need this because the diagnostics are sent to UTF-8-aware VSCode.
I tried to read some articles on UTF-8 in C++ but every material I found is huge. And I only need to skip the UTF-8.