1

I want to make a regex that would match an IDN using Unicode categories (.NET engine). The spoofing prevention is not essential for my goals, so confusing characters don't have to be excluded.

I found some lists of individual characters (e.g. https://www.icann.org/en/system/files/files/idna-protocol-2003-2008.txt), however I want character categories so I wouldn't have to update when a new Unicode version comes out.

Andrey Shchekin
  • 21,101
  • 19
  • 94
  • 162
  • Closely related question: [How to validate a unicode email?](https://stackoverflow.com/questions/19461943/how-to-validate-a-unicode-email/19477481) – nwellnhof Nov 01 '16 at 14:28

1 Answers1

0

All the characters mentioned herein "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)" with the character status "PVALID". Also, the characters with status as "CONTEXTJ" and "CONTEXTO" are valid in certain contexual conditions.

If one wants to dive deeper into the topic, go through the documentation released by Universal Acceptance Steering Group are worth looking at.

ThinkTrans
  • 21
  • 3