38

I am looking for way in JavaScript to convert non-ASCII characters in a string to their closest equivalent, similarly to what the PHP iconv function does. For instance if the input string is Rånades på Skyttis i Ö-vik, it should be converted to Ranades pa skyttis i o-vik. I had a look at phpjs but iconv isn't included.

Is it possible to perform such conversion in JavaScript, if so how?

Notes:

  • more generally this process of conversion is called transliteration
  • my use-case is the creation of URL slugs
Max
  • 12,794
  • 30
  • 90
  • 142
  • Related (but not a real blanket solution): [remove umlauts or specialchars in javascript string](http://stackoverflow.com/q/4804885) – Pekka Aug 05 '12 at 11:08
  • This *may* not be natively possible in JavaScript without maintaining huge replacement tables (at least, I've never seen a method to do it). There is no way to send the data to a server and use iconv there? – Pekka Aug 05 '12 at 11:10
  • 1
    I've once created a function doing this. See http://userscripts.org/scripts/review/112070, Ctrl+F "`var RW759_normalize_accents`". It's used to normalize characters for searches, I manually selected the characters iirc with a tool made for the specific purpose. Based on [this Q&A](http://stackoverflow.com/questions/227950/programatic-accent-reduction-in-javascript-aka-text-normalization-or-unaccentin) – Rob W Aug 05 '12 at 11:18
  • @Pekka: don't you think that by editing the title, you've reduced the scope of the question? I initially had written `non-ASCII characters` which you replaced with `characters with umlauts/accents`. For me there are plenty of other characters than `umlauts` and `accents` which should also be converted: http://en.wikipedia.org/wiki/Diacritic. Maybe Rephrasing title as `Convert non-ASCII characters (umlauts,accents...) to their closest ASCII equivalent (slug creation)` would be a good compromise? – Max Aug 05 '12 at 11:31
  • @user my (non-expert's) assumption had been that all the diacritics are covered by "accents". Sure, go ahead, that sounds like a good compromise – Pekka Aug 05 '12 at 12:04
  • If your target repertoire is not just accented Latin characters, what sort of characters do you want to convert and what should they be converted into? If you can get the Unicode into Fully Decomposed form then stripping the accents should be trivial; but if some of your characters are not composed, that won't help. See also http://www.unicode.org/faq/normalization.html – tripleee Aug 05 '12 at 12:36
  • In PHP i use iconv for this, but there is a port of this to javascript at https://github.com/ashtuchkin/iconv-lite/tree/master/test – Ekim Aug 06 '12 at 08:51
  • 1
    Does this answer your question? [Remove accents/diacritics in a string in JavaScript](https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript) – RiZKiT Nov 11 '22 at 16:11

3 Answers3

38

The easiest way I've found:

var str = "Rånades på Skyttis i Ö-vik";
var combining = /[\u0300-\u036F]/g; 

console.log(str.normalize('NFKD').replace(combining, ''));

For reference see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

mraxus
  • 1,377
  • 1
  • 15
  • 23
Rez
  • 650
  • 7
  • 5
4

I would recommend Unicode package, it will also map Greek and Cyrillic letters to their closest ascii symbol:

unidecode('Lillı Celiné Никита Ödipus');

'Lilli Celine Nikita Odipus'

Adam
  • 25,960
  • 22
  • 158
  • 247
3

It's because iconv is a native compiled UNIX utility behind the most i18n character map conversion functions.

You won't find it in javascript unless you access some browser component.

Encoding is a property of the document so most javascript implementation just simply dismiss it.

You'll need a pure js library for unaccented strings. It would be the best to have one for the specific language you need.

The simpliest way is via some translate tables or even regex replaces.

like here : http://lehelk.com/2011/05/06/script-to-remove-diacritics/

check this thread too : Replacing diacritics in Javascript

Community
  • 1
  • 1
kisp
  • 6,402
  • 3
  • 21
  • 19
  • 1
    I've just realized that replacing diacritics with a single `ASCII` character isn't ideal. For instance in German, `ü` should be converted to `ue` and not just `u`, see http://webmasters.stackexchange.com/questions/33032/how-to-handle-urls-with-diacritic-characters. It seems that even `iconv` is not doing ir (`php -r 'setLocale(LC_ALL,"de_DE"); echo iconv("UTF-8", "ASCII//TRANSLIT", "ü");' // -> u)`, therefore I think I'm going to create translation tables myself (based on `iconv` and tweaked manually) and use those for both `JavaCript` and `PHP`. – Max Aug 06 '12 at 20:46
  • According to this [iconv user comment](http://nl3.php.net/manual/en/function.iconv.php#105507) then iconv will convert `ü` to `ue` if the locale is set to German. – icc97 Dec 10 '13 at 13:18