3

How to convert unicode string to ascii to make a nice string for a friendly url?

Gajus
  • 69,002
  • 70
  • 275
  • 438
complez
  • 7,882
  • 11
  • 48
  • 56
  • 4
    http://stackoverflow.com/questions/626792/converting-uto-u-in-javascript http://stackoverflow.com/questions/286921/javascript-equivalent-of-xpaths-translate –  Jan 08 '10 at 14:26
  • and google for "transliteration" –  Jan 08 '10 at 14:27
  • Replacing accented characters doesn't answer this question. Characters like ㏒ (log), ‰ (per mille), € (Euro), ␀/␆ (nul/ack), ♻ (recycle), ∴ (therefore) remain unaffected by any accented letters becoming unaccented, yet are still not nice friendly url characters until they are replaced by url-safe ascii. This question is much, much broader than that one. – Benjamin Staton Jul 20 '17 at 16:06

1 Answers1

9

There is only a short list of characters that can be safely carried through in a path component of a URL.

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

All the other characters will have to be either removed (if you're creating a "slug") or escaped.

Removal can be done with the regex /[^a-zA-Z0-9-._~]/.

Escaping can be done with encodeURIComponent().

If you wish to achieve an equivalent of ICONV transliteration (that is, turning é into e and into EUR), you'll have to do your own, although you can leverage existing solutions and perhaps transform a transliteration table to JS format.

Breton
  • 15,401
  • 3
  • 59
  • 76
Victor Nicollet
  • 24,361
  • 4
  • 58
  • 89