How to convert unicode string to ascii to make a nice string for a friendly url?
Asked
Active
Viewed 1.4k times
3
-
4http://stackoverflow.com/questions/626792/converting-uto-u-in-javascript http://stackoverflow.com/questions/286921/javascript-equivalent-of-xpaths-translate – Jan 08 '10 at 14:26
-
and google for "transliteration" – Jan 08 '10 at 14:27
-
Replacing accented characters doesn't answer this question. Characters like ㏒ (log), ‰ (per mille), € (Euro), ␀/␆ (nul/ack), ♻ (recycle), ∴ (therefore) remain unaffected by any accented letters becoming unaccented, yet are still not nice friendly url characters until they are replaced by url-safe ascii. This question is much, much broader than that one. – Benjamin Staton Jul 20 '17 at 16:06
1 Answers
9
There is only a short list of characters that can be safely carried through in a path component of a URL.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
All the other characters will have to be either removed (if you're creating a "slug") or escaped.
Removal can be done with the regex /[^a-zA-Z0-9-._~]/
.
Escaping can be done with encodeURIComponent()
.
If you wish to achieve an equivalent of ICONV transliteration (that is, turning é
into e
and €
into EUR
), you'll have to do your own, although you can leverage existing solutions and perhaps transform a transliteration table to JS format.

Breton
- 15,401
- 3
- 59
- 76

Victor Nicollet
- 24,361
- 4
- 58
- 89