3

I'm looking at the IsCharAlphaNumeric Windows API function. As it only takes a single TCHAR, it obviously can't make any decisions about surrogate pairs for UTF16 content. Does that mean that there are no alphanumeric characters that are surrogate pairs?

Puppy
  • 144,682
  • 38
  • 256
  • 465

3 Answers3

5

Characters outside the BMP can be letters. (Michael Kaplan recently discussed a bug in the classification of the character U+1F48C.) But IsCharAlphaNumeric cannot see characters outside the BMP (for the reasons you noted), so you cannot obtain classification information for them that way.

If you have a surrogate pair, call GetStringType with cchSrc = 2 and check for C1_ALPHA and C1_DIGIT.

Edit: The second half of this answer is incorrect GetStringType does not support surrogate pairs.

Community
  • 1
  • 1
Raymond Chen
  • 44,448
  • 11
  • 96
  • 135
0

You can determine yourself by looking at the Unicode plane assignment what you are missing by not being able to inspect non-BMP codepoints.

For example, you won't be able to identify imperial Aramaic characters as alphanumeric. Shame.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
0

Does that mean that there are no alphanumeric characters that are surrogate pairs?

No, there are supplementary code-points that are in the letter group.

Comparing a char to a code-point?

For example, Character.isLetter('\uD840') returns false, even though this specific value if followed by any low-surrogate value in a string would represent a letter.

Community
  • 1
  • 1
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245