0

I am trying to find certain words in an HTML string. The criteria are any of the followings:

  • The word is in the beginning ^.
  • The word is in the middle and there is a space before it.
  • The word is the in the beginning after a tag.

I am able to get the first two but failing to get the third criteria.

Example string:

Leading a team of 5.
You will be leading a team of 5
<span style="color:#f0f;">Leading a team of 5</span>
The code is ok
He is a good coder

The result should be: [Leading, leading, Leading, He]

My current regex:

/(?:^|\s)(lead[a-z]{0,}|he[\s])/gi

I am using replace to enrich the words, for example:

text.replace(regex, `<b>\$1</b>`);

I cannot figure out how to get the word only.

I know I can remove the (?:^|\s) part but this will impact small words like he as it will be matched with the, The ... etc

Oras
  • 1,036
  • 1
  • 12
  • 18
  • 1
    I don't understand the spec. "The word is in the middle and there is a space before it." applies to pretty much every word in your lines. How is it that `leading` was matched and none of the words in `The code is ok` wasn't? If you want `leading` and `he` specifically, why not match those directly? `s.replace(/\b(leading|he)\b/ig, "$1")` – ggorlen Apr 14 '21 at 01:55
  • Does this answer your question? [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Lucas Apr 14 '21 at 02:54

1 Answers1

0

You might use:

(?:^(?:<[^>]*>)?|\s)(he|lead[a-z]*)\b

The pattern matches

  • (?: Non capture group
    • ^(?:<[^>]*>)? Start of string, optionally match a tag like pattern (Assuming no > chars before the closing >
    • | Or
    • \s Match a whitespace char
  • ) Close on capture group
  • (he|lead[a-z]*) Match either he or lead followed by optional char a-z
  • \b A word boundary to prevent a partial match

Regex demo

const regex = /(?:^(?:<[^>]*>)?|\s)(he|lead[a-z]*)\b/gi;
[
  "Leading a team of 5.",
  "You will be leading a team of 5",
  "<span style=\"color:#f0f;\">Leading a team of 5</span>lead",
  "The code is ok",
  "He is a good coder",
  "test the lead test !@#$leadi and leading"
].forEach(s =>
  console.log(`${s} ==> ${Array.from(s.matchAll(regex), m => m[1])}`)
);
The fourth bird
  • 154,723
  • 16
  • 55
  • 70