1

I need to replace all accented char in a string by it's unaccented version, for sorting. I found how to match the accented ones, but is it possible to use a regex to replace each one? I mean:

var re = /ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ/g;
var str = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ";
var newstr = str.replace(re, 'M');
console.log(newstr);

this prints 'M' but I need :'uUuUaaaeeeiiiooouuuAAAEEEIIIOOOUUnN'

Is this possible? thanks

v.k.
  • 2,826
  • 2
  • 20
  • 29

4 Answers4

3

You need to use character classes.

var re = /[ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ]/g;

Then, you can pass a function as a second argument to the replace function. This function shall contain the conversion logic. A simple way would be to use a conversion map.

E.g.

var re = /[ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ]/g;

//incomplete but you get the idea
var conversionMap = {
    'ù': 'u',
    'Ù': 'U',
    'ü': 'u',
    'Ü': 'U',
    'ä': 'a'
};

"ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ".replace(re, function (c) {
    return conversionMap[c] || c;
}); //uUuUaàáëèéïìíöòóuuúÄÀÁËÈÉÏÌÍÖÒÓUÚñÑ

FIDDLE

plalx
  • 42,889
  • 6
  • 74
  • 90
  • Thanks, that's part of it, now i get one 'm' for each char! Great! Now this 'logic' would be an other regex? I mean if i build this in javaScript with for()s if()s and switch()s there is no good to use regex in first place right? Sorry if I'm dumb about this. – v.k. Oct 08 '13 at 00:37
  • Ahhh!! great now I get the idea.Thanks! – v.k. Oct 08 '13 at 00:39
  • Well, i thougth i did, look http://jsfiddle.net/PT6Xc/2/ it's not working as expected.. Did i missed something? – v.k. Oct 08 '13 at 00:51
  • 1
    @v.k. `replace` doesn't modify the original string, it returns a new one. Strings are immutable. http://jsfiddle.net/PT6Xc/3/ – plalx Oct 08 '13 at 00:57
  • I new it was some stupid mistake. Thanks a lot @plalx. that did it. I'll finish the map and keep it in jsfiddle just in case. thanks – v.k. Oct 08 '13 at 01:01
  • @v.k. Obviously you should put that logic in a reusable function ;) You can also use a self-executing function to encapsulate the whole thing without having to recreate the map at each invocations. – plalx Oct 08 '13 at 01:14
  • 1
    I did the hash map and modified to a cached version http://jsfiddle.net/Victornpb/PT6Xc/7/ – Vitim.us Oct 08 '13 at 02:16
  • 1
    @Vitim.us, Have a look at mine in the link above your comment. You can actually cache the replace callback as well and you do not need the logical `||` operator anymore. – plalx Oct 08 '13 at 02:57
  • You guys are amazing! :) Learning a lot from all this. – v.k. Oct 08 '13 at 14:36
1

http://jsfiddle.net/Victornpb/YPtaN/4

var deaccentuate = (function(){

    var accent = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ",
        latin  = "uUuUaaaeeeiiiooouuuAAAEEEIIIOOOUUnN".split("");

    var re = new RegExp("["+accent+"]", "g");

    return function(str){
        return str.replace(re, function(c){
            return latin[accent.indexOf(c)]; }
        );
    } 
})();

deaccentuate("Olá, como estás?"); //Ola, como estas?

Benchmark

I realized a benchmark test with a 2KB text and my function was faster than other answers, reaching 59000 Ops/sec

http://jsperf.com/deaccentuate

enter image description here

Vitim.us
  • 20,746
  • 15
  • 92
  • 109
  • Shorter, but quite inefficient. – plalx Oct 08 '13 at 00:53
  • @Vitim.us thanks, it works, but the other version is indeed faster. – v.k. Oct 08 '13 at 01:02
  • @Vitim.us :) great! I didn't even know jsperf. Love it thanks for that also. The diff with @ plalx was minimal. – v.k. Oct 08 '13 at 03:17
  • 1
    Yeah not much difference, but I just wanted to know if it was really "inefficient" as commented, seems to not be the case. Also have a small footprint than the replace object. – Vitim.us Oct 08 '13 at 03:22
  • At the end of the day, you are all right, and have helped me a lot. All answers are correct, I can only mark one as correct though... But i can thanks you all... – v.k. Oct 08 '13 at 13:00
  • @v.k. Do not forget that tests should be made in **multiple browsers**. I am not sure that you would get the same results. To me it makes no sense that `indexOf` would be faster than a simple property lookup, even if that property isin't numeric. – plalx Oct 08 '13 at 14:08
  • @plalx yeah I know indexOf will always be slower than a object, but maybe because the overhead/bottleneck is somewhere else. – Vitim.us Oct 08 '13 at 14:22
  • @Vitim.us Could you add the method I used in my answer in your Benchmark comparison? – Takit Isy Aug 16 '18 at 10:18
1

This is fairly verbose, in order to be readable. (Well, to each their own, anyway.)

var deaccentuate = (function() {
  var conversion =
      { 'a' : /[äàá]/g
      , 'e' : /[ëèé]/g
      , 'i' : /[ïìí]/g
      , 'o' : /[öòó]/g
      , 'u' : /[üùú]/g
      , 'n' : /ñ/g
      , 'A' : /[ÄÀÁ]/g
      , 'E' : /[ËÈÉ]/g
      , 'I' : /[ÏÌÍ]/g
      , 'O' : /[ÖÒÓ]/g
      , 'U' : /[ÜÙÚ]/g
      , 'N' : /Ñ/g
      }

  return function(str) {
    return Object.keys(conversion).reduce(function(str, c) {
      return str.replace(conversion[c], c)
    }, str)
  }
}())

Usage: (http://jsbin.com/UFEbuho/1/)

var input = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ"

console.log(deaccentuate(input))

The idea is to loop over the keys of the conversion table and replace anything that matches the pattern of that key to the key itself. This is certainly not the most efficient way to do this, but unless the input strings are fairly long it shouldn't matter much.

Marcus Stade
  • 4,724
  • 3
  • 33
  • 54
  • I liked this, thanks. Also it brought my attention to reduce(). You think this has performance differences with the above solution proposed by @plalx? – v.k. Oct 08 '13 at 01:20
  • @v.k. The `reduce` part will not have much impact since looping over a few items is quite fast, however performing 12 replace operations will certainly have a negative impact. However, it will not be noticeable on small inputs, but you know, once the helper function is written, it's still one line to perform the replace operation independently of the implementation... – plalx Oct 08 '13 at 01:35
  • @plalx is absolutely right. The discussion re: performance really is moot since your input probably isn't large enough for it to make a difference and/or the implementation can change without changing the interface. – Marcus Stade Oct 08 '13 at 03:19
0

I can't think about an easier way to efficiently remove all diacritics from a string than using this amazing solution.

See it in action:

var str = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ";

var str_norm = str.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
console.log(str_norm);
halfer
  • 19,824
  • 17
  • 99
  • 186
Takit Isy
  • 9,688
  • 3
  • 23
  • 47