I need to map far-out Unicode Latin substitutes to their respective base letters.
— should match 4× “Overflow”
These examples from Mathematical Alphanumeric Symbols already comprise considerable variants, but there’s the Enclosed Alphanumeric Supplement and of course Halfwidth and Fullwidth Forms, and a vast lot more, so there’s no point to write up a mapping manually.
It’s not about diacritics (as in questions like this or this, more like this). I also don’t want to merely test for occurence, so even with support, \p{Symbol}
wouldn’t help.
If I was looking for a U+0041 LATIN CAPITAL LETTER A
, I want to also match U+1D49C MATHEMATICAL SCRIPT CAPITAL A
as well as U+FF21 FULLWIDTH LATIN CAPITAL LETTER A
and also, say, every other derivative with “[LATIN] LETTER A” in it.
Is there some sort of attributes to Unicode code points, pointing to a possible base character they were derived from, and to to evaluate it programmatically (T-SQL, .NET)?