1

I'm working with Javascript and would like to have a regex to find an exact match of a word, I usually could do this:

const regex = new RegExp(`\\b${word}\\b`, "gm");

Where word comes from an array, however with this expression \bsé\b, with this block

no lo sé muy bien
casé
sénégal

Is not working as expected, does regex has any special issue with letters with accent marks? What is the approach I should take?

Thanks!


POST-EDIT: Given regex won't support accents by default, I ended up doing this

function enhanceRegex(word) {
  const accents = ["á", "é", "í", "ó", "ú"];
  return accents.includes(word.toLowerCase().slice(-1))
    ? `\\b${word}(\\s|!|\\?|\\.|,|;|:)+(\\b)?`
    : `\\b${word}\\b`;
}

basically, I'm adding the most common scenarios I'll have for my texts, probably is not the best but it helps.

Rudy Palacios
  • 122
  • 2
  • 8
  • The following is the equivalent of `indexOf` if it comes to [internationalization](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl) and locale-based [word-segmentation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter#examples). The index of `'sé'` within `'no lo sé muy bien\ncasé\nsénégal'` is `6` ...prove... `Array.from(new Intl.Segmenter('es', { granularity: 'word' }).segment('no lo sé muy bien\ncasé\nsénégal')).find(({ segment }) => segment === 'sé')?.index ?? -1` ...for this alone I vote for a reopening. – Peter Seliger Feb 26 '22 at 01:47

0 Answers0