I am dealing with developing and Application for European Client and they have their native character set.
Now I need to have regex which would allow foreign characters like eéèêë
etc and am not sure of how this can be done.
Any Suggestions ?
I am dealing with developing and Application for European Client and they have their native character set.
Now I need to have regex which would allow foreign characters like eéèêë
etc and am not sure of how this can be done.
Any Suggestions ?
If all you want to match is letters (including "international" letters) you can use \p{L}
.
You can find some information on regex and Unicode here.
If you want to match any Latin character with an accent or diacritic mark in virtually any regular expressions engine, try:
[A-Za-zŽžÀ-ÿ]
It matches any character in the "Printable and Extended ASCII Character" sets following:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
ŽžÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Matches {char} (ASCII character index, case sensitive):
char(s) | index(start) | index(end) |
---|---|---|
[A-Z] | 65 | 90 |
[a-z] | 97 | 122 |
Ž | 142 | --- |
ž | 158 | --- |
[À-ÿ] | 192 | 255 |
Test it at https://regex101.com/r/Xbbtm1/1
\p{L} isn't cross-browser yet. Transpiling down from this will give you massively bloated code if you use it a lot.
Here is a short and sweet answer to generally including non-ascii letters that doesn't add a gazillion lines of JavaScript or plugins. Replace a-zA-Z0-9 or \w in your regex with this, and don't use the u flag:
\u00BF-\u1FFF\u2C00-\uD7FF\w
This inserted into all my JavaScript regexes in place of a-zA-Z0-9 or \w, seems to do the job. My context was in the discerning of UTF-8 in HTML and CSS, and it had to be cross-browser.
I can't believe it is this simple, so am waiting to be proved wrong, after a day's searching of trying to get something to work in Firefox...
I've only tested this using Japanese hirigana with a french accent.
[e\xE8\xE9\xEA\xEB]
will match any one of eéèêë