1

Lets say I have a regexp that looks like:

\w+

Then this string would pass:

helloworld

However this won't:

héllowörld

It will stop at é (and theöwill break it as well) even though for a human héllowörld doesn't sound so far fetched as a single word.

Is there a way I can improve \w so it will also include special word characters? Or do I have to append every special latin character into my regexp like this into:

[\wéèåöä...........]+

Because that doesn't seem like the best option to try and figure out what all the different special latin characters there are in the world that would be reasonable.

What options do I have?

corgrath
  • 11,673
  • 15
  • 68
  • 99

2 Answers2

1

\w match any word character [a-zA-Z0-9_]. It doesn't match non-english character.

Read this post for Regular expression to match non-english characters?

Community
  • 1
  • 1
Braj
  • 46,415
  • 5
  • 60
  • 76
0

Sometimes I use an inverse method to match non-english among the other characters. Check this out

var string = "你好 κόσμος привет šđčߣłćž çë asgfgrtzj 657 #$%&/()=?*!";

The pattern below

var pattern = /([^0-9]+)/gi;

will exclude all numbers

你好 κόσμος привет šđčߣłćž çë asgfgrtzj #$%&/()=?*!";

adding special characters from the above to the pattern

var pattern = /([^0-9#$%&/()=?*!]+)/gi;

the final string would look as following

你好 κόσμος привет šđčߣłćž çë asgfgrtzj 
hex494D49
  • 9,109
  • 3
  • 38
  • 47