4

Never played before with umlauts or specialchars in javascript strings. My problem is how to remove them?

For example I have this in javascript:

var oldstr = "Bayern München";
var str = oldstr.split(' ').join('-');

Result is Bayern-München ok easy, but now I want to remove the umlaut or specialchar like:

Real Sporting de Gijón.

How can I realize this?

Kind regards,

Frank

Andy E
  • 338,112
  • 86
  • 474
  • 445
Frank
  • 499
  • 2
  • 10
  • 22
  • To *comment* on someone's answer, you want the "add comment" link -- if you add an answer, that person doesn't get any kind of notification and may not come back to look. The only problem I see with your code is that the first line is missing quotes. jQuery shouldn't be the problem, works for me: http://jsbin.com/axasa4 Now, literal characters can fall prey to encoding issues, so you may want to use unicode escapes instead: http://jsbin.com/axasa4/2 – T.J. Crowder Jan 26 '11 at 16:08
  • T.J., encoding shouldn't be the issue, since both 'ü' characters are written on the same page. (Agreed on comments, missing quotes, and jQuery) – Martijn Jan 26 '11 at 16:57
  • @Martijn: I'm being open to the idea of the input coming from somewhere else, people serving the page with the wrong encoding, etc., etc. – T.J. Crowder Jan 26 '11 at 17:01
  • @TJ, you're right about defensive programming in general; I was referring to this specific case. Then again, Frank didn't paste his actual code either (since it wouldn't return anything with those quotes missing). Oh, and thanks for the link to jsbin, I didn't know that one. :-) – Martijn Jan 26 '11 at 17:56
  • Sorry for the missing quotes ^^ – Frank Jan 27 '11 at 17:49
  • Related question: http://stackoverflow.com/questions/11815883/convert-non-ascii-characters-umlauts-accents-to-their-closest-ascii-equiva – Max May 14 '14 at 07:53
  • 1
    Possible duplicate of [Remove accents/diacritics in a string in JavaScript](https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript) – Mureinik Aug 03 '18 at 17:42

2 Answers2

13

replace should be able to do it for you, e.g.:

var str = str.replace(/ü/g, 'u');

...of course ü and u are not the same letter. :-)

If you're trying to replace all characters outside a given range with something (like a -), you can do that by specifying a range:

var str = str.replace(/[^A-Za-z0-9\-_]/g, '-');

That replaces all characters that aren't English letters, digits, -, or _ with -. (The character range is the [...] bit, the ^ at the beginning means "not".) Here's a live example.

But that ("Bayern-M-nchen") may be a bit unpleasant for Mr. München to look at. :-) You could use a function passed into replace to try to just drop diacriticals:

var str = str.replace(/[^A-Za-z0-9\-_]/g, function(ch) {
  // Character that look a bit like 'a'
  if ("áàâä".indexOf(ch) >= 0) { // There are a lot more than this
    return 'a';
  }
  // Character that look a bit like 'u'
  if ("úùûü".indexOf(ch) >= 0) { // There are a lot more than this
    return 'u';
  }
  /* ...long list of others...*/
  // Default
  return '-';
});

Live example

The above is optimized for long strings. If the string itself is short, you may be better off with repeated regexps:

var str = str.replace(/[áàâä]/g, 'a')
             .replace(/[úùûü]/g, 'u')
             .replace(/[^A-Za-z0-9\-_]/g, '-');

...but that's speculative.

Note that literal characters in JavaScript strings are totally fine, but you can run into fun with encoding of files. I tend to stick to unicode escapes. So for instance, the above would be:

var str = str.replace(/[\u00e4\u00e2\u00e0\u00e1]/g, 'a')
             .replace(/[\u00fc\u00fb\u00f9\u00fa]/g, 'u')
             .replace(' ','-');

...but again, there are a lot more to do...

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • That would do it as long as character is always `ü` :) – Sarfraz Jan 26 '11 at 13:13
  • @Sarfraz: Well, that was his example -- but I was editing in a more general solution. :-) – T.J. Crowder Jan 26 '11 at 13:14
  • 3
    Note that, since Bayern-München is German, you _should_ follow German rules, and replace 'ü' with 'ue'... :-) – Martijn Jan 26 '11 at 16:55
  • @Martijn: Thanks. (I don't know virtually anything about German.) – T.J. Crowder Jan 26 '11 at 17:02
  • 2
    @Martijn Then again, as a German I'd say as long as its only Bayern-München, who cares? :-) – cg. Jul 11 '13 at 12:56
  • Is there a complete list of these characters available? – Tom Jan 01 '14 at 18:14
  • I tried your last suggestion in a bookmarklet but the characters constantly reverts to URL-encoding. It should be `var str = str.replace(/[åä]/g, 'a').replace(/[ö]/g, 'o');` but it turns into `var str = str.replace(/[%c3%a5%c3%a4]/g, 'a').replace(/[%c3%b6]/g, 'o');` and then it doesn't work. How do I prevent this? – d-b Dec 06 '19 at 13:10
1

Theres a npm package called "remove-accents".

  1. Install the package: npm i remove-accents.
  2. Import the remove function: import { remove } from "remove-accents";
  3. Use the function: remove(inputString)
msmsms
  • 81
  • 6