3

I run Regex checks on certain inputs on my site, but the Regex wrongfully returns false when users use "Fancy" Unicode sets such as:

Ⓜⓐⓣⓒⓗ Match ⒨⒜⒯⒞⒣

These are not different fonts, they are different characters! None of these are matched by /Match/ (Proof)

How can I convert the user input to standard ABC characters before running through my Regex checks? (I'm using PHP, if that makes a difference)

Fomo
  • 143
  • 9

1 Answers1

2

The NFKD unicode normalisation should take care of most of those. However, it seems it only works if intl module is enabled, and I don't have it in my environment, so I can't test it. If you also don't have such a PHP, and don't want to install it, this does something a bit similar, at least for some of the characters:

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)

Finally, you can make your own mapping, for example using strtr (which you will then know to work, since you'd've written it yourself).

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • [try it online](https://tio.run/##FdDNSgJRGIDh/VyFzKogkRaBUSRtAiEK0q5AAtvkrNr7M2aRoGd0fsTTj1ZoZFkkVjg3c25gzh1M7yweDrxzvpnhs8pWHO/mrLJlnJXKldR5qXJxuWaeFg/SWXMjZeYLx@lsdms7vZnJFE/2jwqH@WLSlVNTTkc5Y@UI5XiGvreH6GAMAc@IVrMorEZhLwrrUdgytOzcYoAZhngk9n@0dJsI0EKb2P3WUtTQRwPXRIcR5xUh3vCZjPO0L/CMHgJisNJycAMJPjroEl1G3D8tvTYnF7wq0XvAu5Z@lXOORTLObwQupvAhiT4v90dY4AlTQ4mJEkMlPpS4U2LMGq6@8IJfTDAjNlaYs5M6Jxcay2RhRJvYJNpEe5ky13eM3F4c/wM) – Nahuel Fouilleul Nov 08 '18 at 06:01
  • 1
    Your `strtr` idea doesn't seem to work: [TryItOnline](https://tio.run/##K8go@P/fxr4go4ArNTkjX6G4pKikSEP90aRt6jpQ0lFd05rL3u7/fwA) @Amadan – Fomo Nov 08 '18 at 06:59
  • @Fomo Seems you're right. Man, I hate PHP. >.< There's some info at [multibyte strtr() -> mb_strtr()](https://stackoverflow.com/questions/2758736/multibyte-strtr-mb-strtr). – Amadan Nov 08 '18 at 07:20