Say I want usernames to only consist of letters and digits regardless of language.
I think I might accomplish this with the following regex parts
(?>\p{L}[\p{Mn}\p{Mc}]*) //match any letter, including those consisting of two code points
\p{Nd} //match any digit
Now I have the problem that users may pretend to be other users by using a username that has the same look like the one from another user (homograph attack). admin vs admin would be an example.
I guess it's not possible to easily exclude characters that are both letters and confusables using a regex but how about outside the context of the regexes. Do the unicode ids of confusables lie in certain ranges that we could filter or something like that?