If I look in UAX#15 Section 9, there is sample code to check for normalization. That code uses the NFC_QC property and checks CCC ordering, as expected. It looks great except this line puzzles me if (Character.isSupplementaryCodePoint(ch)) ++i;
. It seems to be saying that if a character is supplemental (i.e. >= 0x10000), then I can just assume the next character passes the quick check without bothering to check the NFC_QC property or CCC ordering on it.
Theoretically, I can have, say, a starting code point, followed by a supplemental code point with CCC>0, followed by a third code point with CCC>0 and lower than that of the second code point or NFC_QC==NO, and it will STILL pass the NFC quick check, even tho that would seem to not be in NFC form. There are a bunch of supplemental code points with CCC=7,9,216,220,230, so it seems like there are a lot of possibilities to hit this case. I guess this can work if we can assume that it will always be the case throughout future versions of Unicode that all supplemental characters with CCC>0 will also have NFC_QC==No.
Is this sample code correct? If so, why is this supplemental check valid? Are there cases that would produce incorrect results if that supplemental check were removed?
Here is the code snippet copied directly from that link.
public int quickCheck(String source) {
short lastCanonicalClass = 0;
int result = YES;
for (int i = 0; i < source.length(); ++i) {
int ch = source.codepointAt(i);
if (Character.isSupplementaryCodePoint(ch)) ++i;
short canonicalClass = getCanonicalClass(ch);
if (lastCanonicalClass > canonicalClass && canonicalClass != 0) {
return NO; }
int check = isAllowed(ch);
if (check == NO) return NO;
if (check == MAYBE) result = MAYBE;
lastCanonicalClass = canonicalClass;
}
return result;
}