14

I nedd to add a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ x time but I find this very ugly. So I try \p{L} but it does not working in JavaScript.

Any Idea ?

my actual regex : [a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ][a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ' ,"-]*[a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ'",]+

I want to have a thing like that : [\p{L}][\p{L}' ,"-]*[\p{L}'",]+ (or smaller than the actual expression)

charles Lgn
  • 500
  • 2
  • 7
  • 28
  • 1
    I find hard to understand the question.. do you want to match multiple occurrences of that character set? can you provide an example of text that should be matched by your regex? – Francesco May 04 '18 at 15:47
  • You can use a regex library that handles non-latin letters better like [XRegExp](http://xregexp.com/) – VLAZ May 04 '18 at 15:47
  • actualy I'd like to make a thing like that ` [\p{L}][\p{L}' ,"-]*[\p{L}'",]+ ` instead of : ` [a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ][a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ' ,"-]*[a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ'",]+ ` – charles Lgn May 04 '18 at 15:52
  • That's ugly but rather the best solution. Don't think about performance. It's all the same. – revo May 04 '18 at 15:58
  • If I forget one caractere that I need (like î or ò) and I don't test all caracteres how can I be sure that I don't forget one (moreover I need to use that a lot of time so my expressions are unreadable and if I need to come back of it, I will maybe don't understand why it is so long). – charles Lgn May 04 '18 at 16:07
  • 1
    finaly I used this : `^(?!.*\/\/)[A-zÀ-ž][A-zÀ-ž\/]*[A-zÀ-ž-'" ]*[A-zÀ-ž'"]$` – charles Lgn May 15 '18 at 09:13
  • Add the "u" flag to your regex for \p{L} to work. The official JS Guide says it clearly: "For Unicode property escapes to work, a regular expression must use the u flag". – acortad Dec 31 '22 at 00:40

2 Answers2

7

What you need to add is a subset of what you asked for. First you should define what set of characters you need. \pL means every letter from every language.

It's kind of ugly but doesn't affect performance and rather the best solution to get around such kind of problems in JS. ECMA2018 has a support for \pL but way far to be implemented by all major browsers.

If it's a personal taste, you could reduce this ugliness a bit:

var characterSet = 'a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ';
var re = new RegExp('[' + characterSet + ']' + '[' + characterSet + '\' ,"-]*' + '[' + characterSet + '\'",]+');

This update credits go to @Francesco:

var pCL = 'a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ';
var re = new RegExp(`[${pCL}][${pCL}' ,"-]*[${pCL}'",]+`);
console.log(re.source);
revo
  • 47,783
  • 14
  • 74
  • 117
  • I was thinking about something like that. probably with a template string looks better: ```[a-z${pL}][a-z${pL}\\ ,"-]*``` ecc.. or maybe not – Francesco May 04 '18 at 16:13
  • Thank you, updated accordingly. – revo May 04 '18 at 16:18
  • 1
    @revo [they're not allowing support for `\pL` but they are for `\p{L}`](https://github.com/tc39/proposal-regexp-unicode-property-escapes#why-not-support-eg-pl-as-a-shorthand-for-pl) – ctwheels May 04 '18 at 16:36
  • @ctwheels I could use `\p{Letter}` instead. I'm mainly talking about a known unicode property not which syntax of it will be supported in fact. – revo May 04 '18 at 16:41
2

You have XRegExp addon to support unicode letter matcher:

var unicodeWord = XRegExp("^\\pL+$"); // L: Letter

Here you can see more example matching unicode in javascript

http://xregexp.com/plugins/

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123