0

I have domain names being submitted with characters like \u8236, but every time it is something else. How can I safely remove all the bad characters without knowing which ones are there?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
realPro
  • 1,713
  • 3
  • 22
  • 34
  • Why not encode the url before submission? – Jamshaid K. Feb 21 '21 at 09:53
  • 1
    Does this answer your question? [How to recognize if a string contains unicode chars?](https://stackoverflow.com/questions/4459571/how-to-recognize-if-a-string-contains-unicode-chars) – Jamshaid K. Feb 21 '21 at 09:57
  • `I have domain names being submitted` Why are people submitting domain names to you? To what end? – mjwills Feb 21 '21 at 11:40
  • I you are getting an http request you should have a separated page for each language. The language should be in the http header. Once you know the language than you can apply an encoding associated with the language – jdweng Feb 21 '21 at 14:18
  • 1
    You should read about localized domains. Characters of various charsets should be allowed (there are also top level domains). But before to transmit them, you should translate into "ASCII-like" domain names (and back to display to users). Check how browsers allows such non ASCII domain names. – Giacomo Catenazzi Feb 22 '21 at 08:10
  • @GiacomoCatenazzi thank you... yes I already understand that all the localized domains are a major obstacle that prevent me from doing a simple Unicode remove. – realPro Feb 23 '21 at 21:22

0 Answers0