3

I have this regexp:

(\b)(emozioni|gioia|felicità)(\b)

In a string like the one below:

emozioni emozioniamo felicità felicitàs

it should match the first and the third word. Instead it matches the first and the last. I assume it is because of the accented character. I tried this alternative:

(\b)(emozioni|gioia|felicità\s)(\b)

but it matched "felicità" only if there is an other word after it. So for being specific only if it is in this context:

emozioni emozioniamo felicità felicitàs

and not in this other:

emozioni emozioniamo felicitàs felicità

I've found an article about accented characters in French (so at the beginning of the word) here, i followed the second answer. If anyone knows a better solution it is very welcome.

Community
  • 1
  • 1
softwareplay
  • 1,379
  • 4
  • 28
  • 64

2 Answers2

2

A word boundary \b works only with characters that are in \w character class, i.e [0-9a-zA-Z_], thus you can't put a \b after an accentued character like à.

You can solve the problem in your case using a lookahead:

felicità(?=\s|$)

or shorter:

felicità(?!\S)

(or \W in place of \s as suggested @Sniffer, but you take the risk to match something like :felicitàà)

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
1

Try the following alternative:

\b(emozioni|gioia|felicità)(?=\W|$)

This will match any of your listed words, as long as any of those words is followed by either a non-word character \W or end-of-string $.

Regex101 Demo

Ibrahim Najjar
  • 19,178
  • 4
  • 69
  • 95
  • 1
    @softwareplay If you are **not forced** to use that `\b` then don't worry, what ever words you put in that list, this would work. – Ibrahim Najjar Oct 17 '13 at 13:33
  • @Sniffer Considering the fact that JS does not support lookbehinds do you have a solution for words _starting_ with accented characters? – UncleZen Dec 01 '14 at 20:36