12

I'm wondering how it's possible to 'translate' characters in UTF-8 to the closest ASCII equivalent using Javascript, just like Iconv doest in PHP.

Example:

ü becomes u
ó becomes o

I'd rather not use a replace, because a) it requires a complete set of characters, which is a lot of work and b) i'd would be hard to get a complete set of characters, and i'll never be certain if i'm missing one or two.

Simon
  • 5,464
  • 6
  • 49
  • 85
  • When you say "UTF-8", do you *really* mean UTF-8? Like, a string of "characters" whose elements are not actually characters at all, but simply UTF-8 code-units promoted to 16 bits? Or do you mean "Unicode", i.e., a normal JavaScript UTF-16 string? – ruakh Nov 09 '12 at 14:12
  • 2
    It's possible, but there's no algorithmic way to do it besides having a map from Unicode values to whatever ASCII "equivalent" you (or somebody) think to be appropriate. Note that a UTF-8 string may include code points for *many* very different alphabets. – Pointy Nov 09 '12 at 14:12
  • The term for this is 'transcription' or 'transliteration', there are probably some libraries out there. – kapex Nov 09 '12 at 14:29

3 Answers3

15

The easiest way I've found:

var str = "üó";
var combining = /[\u0300-\u036F]/g; 

console.log(str.normalize('NFKD').replace(combining, ''));

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

superhero
  • 6,281
  • 11
  • 59
  • 91
Rez
  • 650
  • 7
  • 5
  • Only drawback : it is part of EcmaScript 6, and doesn't work on all browsers – Pierre-Olivier Vares Oct 03 '14 at 16:28
  • Another drawback: only covers combining marks, not full transliteration. – cmbuckley Nov 23 '15 at 20:33
  • Check out similar SO question and accepted answer for comments why the above code won't work - the comments list letters that this code fails to replace: https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript – iaforek Jan 10 '20 at 12:57
11

As @Pointy said, your only option is to map/replace characters according to a dictionary.

You'll find this really useful: https://github.com/backbone-paginator/backbone.paginator/blob/a579796a30e583c4dfa09e0a86e4abd21e0b5b56/plugins/diacritic.js

timing
  • 6,340
  • 1
  • 17
  • 16
alexandernst
  • 14,352
  • 22
  • 97
  • 197
2

There is now a port of iconv to JS: https://www.npmjs.com/package/iconv

var iconv = new Iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE');
iconv.convert('ça va が'); // "ca va "
Madarco
  • 2,084
  • 18
  • 26