1

I am trying to build a regex that matches a word within <p> and <div> tags and replace the word with some other text. This word could be at the start of a tag or between other words or at the end of a sentence (trailed by a fullstop or a comma or a semicolon). My regex works, but not completely. Also the tags could have css classes as attributes.

My regex : [^<>\n]*\b(Cat|Dog|Fish)\b[^<>\n]*

So, if the text is something like this:

(1) <p> Cat test dfdsf</p>
(2) <p> Cat.</p>
(3) <p>Cat.</p>
(4) <p class="test">Cat</p>
(5) <div>Cat</div>
(6) <p>Catfgdggh</p>
(7) <li>Cat</li>

It should match all above but (6) and (7). Also only "Cat" should match and not the other words within the tag.

Any help would be much appreciated. Also, can you please give explanation. Thanks :)

NomadTraveler
  • 1,086
  • 1
  • 12
  • 37
  • What about cases like `Cat-astrophe`? Do you want to allow the keywords as parts of compound words? – Wiktor Stribiżew Jul 23 '15 at 07:47
  • No, that shouldn't match. Idea is to match a word completely. Now, a word could be at the end of a sentence, so it is followed by a . or , or ; it should still match – NomadTraveler Jul 23 '15 at 23:24

1 Answers1

4
\b(Cat|Dog|Fish)\b

Use \b or word boundary.

\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)

vks
  • 67,027
  • 10
  • 91
  • 124
  • Cheers @vks. I have just added another criteria in the question - it needs to only match within p and li tags. The tags could have css class as an attribute as well. – NomadTraveler Jul 24 '15 at 02:28
  • That is brilliant. Would you be kind enough and explain as well please? – NomadTraveler Jul 24 '15 at 05:13
  • @NomadTraveler `[<>\n]*` would capture any thing inside the `tags`.We capture the tag and with `\1` we make sure we match the same ending `tag`. – vks Jul 24 '15 at 05:25
  • Thanks so much! I have a very silly question, which is probably unrelated. I know the regex works in https://regex101.com/r/oC5rY5/3, but it doesn't work in my website? I am making this regex call in Angular JS file which is a part of a .NET solution. – NomadTraveler Jul 24 '15 at 06:12
  • @NomadTraveler have very less idea on that.In .net u need to use `@` or verbatinum mode – vks Jul 24 '15 at 06:13
  • Sorry @vks, got one more question. Would your regex match the whole sentence or just the word? I just want to match the word and replace it with something else. But looks like your regex is matching the whole sentence? – NomadTraveler Jul 27 '15 at 02:02
  • Thanks @vks, is there no way in JS to just match the word? Capture and return just the matched word. – NomadTraveler Jul 27 '15 at 04:41
  • I don't know how to do the replace in JS using variables. Replace should only be done if a match is found. How to know if a match is found? @vks – NomadTraveler Jul 27 '15 at 04:57
  • Sorry to make it difficult. How can I access the matched word, $2 in JS, once the regex match is done? I need to fetch some data based on the word that is matched. – NomadTraveler Jul 27 '15 at 05:13
  • Thanks so much @vks. Can I possibly access $3 before a replace? Like after a match()? – NomadTraveler Jul 27 '15 at 05:37
  • So, if(str.match(regex)) { var a = $3; } ? – NomadTraveler Jul 27 '15 at 05:39
  • @NomadTraveler you can check the code using code generator on the left side . https://regex101.com/r/oC5rY5/12#javascript – vks Jul 27 '15 at 05:40
  • I want to store the value of $3 into a variable. Can't seem to access it though :( Can you please help? – NomadTraveler Jul 27 '15 at 05:43
  • @NomadTraveler you can post it as new question as i am not familiar with javascript :( also you can accept this answer and close tis question :) – vks Jul 27 '15 at 06:15
  • Sorry @vks back to this. It won't return multiple occurrences of a word in the sentence? – NomadTraveler Jul 27 '15 at 07:28
  • When the mulitple occurence is within the same tag, then it only gives one of them in the match 3 - https://regex101.com/r/oC5rY5/20 – NomadTraveler Jul 27 '15 at 22:51
  • @NomadTraveler regex engine remembers only the last group.So if you want them ,first extract the required html tag and then extract the words – vks Jul 28 '15 at 04:28