2

I have a problem with RegExp not recognizing special characters as word characters (or worse - counting as \b):

"wäww, xöxx  yüyy zßzz".replace(/\b\w/g,function(m){return m.toUpperCase();})

should return

"Wäww, Xöxx  Yüyy Zßzz"

but unfortunately returns:

"WäWw, XöXx  YüYy ZßZz"

I played with several encodings but that didn't help...

How can I make it recognize the characters or otherwise work around that problem?

There is a question with a similar problem with no satisfying answer.

Community
  • 1
  • 1
Christoph
  • 50,121
  • 21
  • 99
  • 128

7 Answers7

2

Cheat

Instead of trying to work around the nuances of unicode and js, just use the space as the marker for your replace/caplitalize logic:

> "wäww, xöxx  yüyy zßzz".replace(/( |^)[^ ]/g,function(m){return m.toUpperCase();});
"Wäww, Xöxx  Yüyy Zßzz"

OR

> "wäww, xöxx  yüyy zßzz".replace(/(\s|^)[^ ]/g,function(m){return m.toUpperCase();});
"Wäww, Xöxx  Yüyy Zßzz"

OR

> "wäww, xöxx  yüyy zßzz".replace(/([\s\.,:;]|^)[^ ]/g,function(m){return m.toUpperCase();});
"Wäww, Xöxx  Yüyy Zßzz"

This kind of technique will correctly capitalize accented chars:

> "wäww, öhyes".replace(/( |^)[^ ]/g,function(m){return m.toUpperCase();})
"Wäww, Öhyes"
Community
  • 1
  • 1
AD7six
  • 63,116
  • 12
  • 91
  • 123
  • well, sometimes you are stuck to a problem and don't recognize the easy solution^^ I finally went with this: `/(?:\s|^)[\wäöüß]/g` – Christoph Oct 27 '12 at 20:30
1

I chose to attack the problem from a different perspective: How can I get the first letter of each word?

Here's what I came up with:

"wäww, xöxx  yüyy zßzz".replace(/(?:^| )[^ ]/g,function(m){return m.toUpperCase();});

Returns:

"Wäww, Xöxx  Yüyy Zßzz"
ohaal
  • 5,208
  • 2
  • 34
  • 53
1

If you only need it for presentation and not Javascript calculation, setting the CSS style

text-transform: capitalize;

on the element would work.

Jan
  • 5,688
  • 3
  • 27
  • 44
  • That's a much better suggestion :) – AD7six Oct 27 '12 at 19:44
  • I don't mean to be rude, but do you see any css-tag? – Christoph Oct 27 '12 at 20:15
  • 4
    Are you sure that it never occured to the OP that CSS could be used to achieve his goal? Surely a trivial CSS style tag would be favorable over a convoluted Javascript replacement script, IF it would be possible. Just throwing it out there. – Jan Oct 27 '12 at 20:20
0

You'll have to modify your regex:

"wäww, xöxx  yüyy zßzz".replace(/[\wäüßö]/g,"x")
vyakhir
  • 1,714
  • 2
  • 17
  • 21
0

simple way would be, to invert your expression and define manually all none-word-chars (not nice but useful)

/[^\s,\.;+\- and much more]/g

if you know all possible word chars you could do it so

/[\wäöüßÄÖÜ and much more]/g

bukart
  • 4,906
  • 2
  • 21
  • 40
0

search for [^a-zA-Z ,]|[a-zA-z] and replace with x

You can see this working here.

Christoph
  • 50,121
  • 21
  • 99
  • 128
pogo
  • 1,479
  • 3
  • 18
  • 23
  • 1
    Bad idea - that would also replace every other special (non-word) character. – Christoph Oct 27 '12 at 19:30
  • That's what I thought you needed. You need to replace every character with x ? – pogo Oct 27 '12 at 19:30
  • That does not make sense - i would just write `\w\W` and that would replace everything which i don't want. Also i rephrased my question because my example didn't account for my original problem. – Christoph Oct 27 '12 at 19:32
  • \w\W would replace spaces, commas and everything else. In your original question, you wanted the output to be "xxxx, xxxx xxxx xxxx" – pogo Oct 27 '12 at 19:33
  • `[^a-zA-Z ,]|[a-zA-z]` is practically the same as `\w\W` – Christoph Oct 27 '12 at 20:16
0

JavaScript regular expressions treat \w as matching Ascii letters, common digits, and underline character only. In general, JavaScript regexps play in the Ascii world.

If you have a small number of “special” characters to deal with, you can code them separately, but in general, you should look for libraries that can handle the situation more generally, as suggested in answers to the question Javascript + Unicode regexes mentioned by @Pumbaa80 in a comment.

Community
  • 1
  • 1
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390