So I have this function in a large codebase that checks for invalid characters that looks something like this :
validateMe(std::string myString)
{
for (int i = 0; i < myString.length(); i++)
{
if ((myString[i] == 0x7E) || ...)
{
return NOT_VALID_STRING;
}
}
return VALID_STRING;
}
before calling validateMe, the string was converted to UTF8.
Now, this worked fine until it was tested for Chinese characters.
I'm going through http://utf8everywhere.org/, trying to understand better everything, but its like a pretty deep rabit hole I'm getting into.
I guess I have to somehow find the code points, test if each is in a valid range where the invalid characters are, and if so I can look for the invalid characters. But how do I find the code points?
I've read that std::string should be able to handle this, but
myString.find("~") != std::string::npos
fails with chinese characters, I guess because the first bites of the chinese character are 0x7E. At least the ones I've tried.
So, how to check for invalid characters in a string that could be written in Chinese? Lets assume by Chinese EUC-CN.
EDIT:
validateMe("testme") should pass
validateMe("test~me") should NOT pass
when the user puts the characters "啊是的发" (that is, the first character for each letter in "asdf" in Chinese EUC-CN) through the GUI, the function fails. In fact, it finds "~" or 0x7E. The VS debugger indeed translates the input as 啊是的å‘, which has a '~'.