4

I know that \w match any word character [a-zA-Z0-9_] or [\p{L}\p{N}_] if compiled with (?u).

The french language has 42 characters [a-zàâæçéêëîïôœùûüÿ]. Is it possible to build my regex according to my locale.

How can I match [a-zàâæçéêëîïôœùûüÿ] with \w?

A partial answer would be to use unicode regexes with \p{Latin}.

nowox
  • 25,978
  • 39
  • 143
  • 293
  • 2
    Please tag your question with the programming language you are using. You will get a better answer this way. – Tim Biegeleisen Aug 28 '15 at 06:49
  • possible duplicate of [Regular expression to match non-English characters?](http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters) – KeyNone Aug 28 '15 at 06:50
  • It depends on your regular expression engine, but `\w` typically is, or can be made, locale-sensitive. – chepner Aug 28 '15 at 06:51
  • @chepner Is it also the case on regex101? – nowox Aug 28 '15 at 06:52
  • 1
    regex101 seems to only use the `u` modifier for matching Unicode characters, which might be more general than you want. (For example, `ø` is not in the French alphabet, but `/\w/u` would match it.) – chepner Aug 28 '15 at 06:57
  • 2
    The question is now tagged 'perl' and 'pcre'. Which of the two are you asking about? – reinierpost Aug 28 '15 at 08:15
  • @reinierpost Both actually. regex101 which I am using quite often is `pcre`, on the shell I prefer `perl` over `sed`, but I am also using `ag` which is `pcre`... – nowox Aug 28 '15 at 20:07
  • OK thanks for the info, then the tags are perfect! – reinierpost Aug 29 '15 at 21:41

1 Answers1

2

The l modifier makes the match locale-aware:

"foo" ~= m/\w/l;

Instead of using l directly, though, use use locale per mob's link.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Is `l` a `Perl` specific flag? – nowox Aug 28 '15 at 06:55
  • Possibly? There is no standard set of flags, which is why regular-expression questions require a specific language tag. – chepner Aug 28 '15 at 06:57
  • 3
    https://metacpan.org/pod/distribution/perl/pod/perlre.pod#Character-set-modifiers -- recommends that you don't use the `/l` modifier directly. Instead, `use locale` and any regex compiled in the scope of your locale will implicitly use the `'/l` modifier. – mob Aug 28 '15 at 13:47