-2

I need a regex pattern that will match the first occurrence of a word, not wrapped inside of an 'a' tag, but can be wrapped inside of any other tag.

i.e. negative lookahead to see if matching word is inside of an a' tag, if so ignore and keep looking for a valid match.

Example strings

Payload 1:

<p>Sample 1 <a href="shouldNotMatchWrappedInA">wordToMatch</a> some random text 
to not be matched followed by wordToMatch, this should work.</p>

Expected Result 1:

wordToMatch ("Not the one inside of a' tags but the following one")

Payload 2:

<p>Sample 2 <a href="shouldNotMatchWrappedInA">wordToMatch</a> some random text 
to not be matched followed by <b>wordToMatch</b> this should work.</p>

Expected Result 2:

wordToMatch ("The one inside of the b' tags")

Payload 3:

<p>Sample 3 <a href="shouldNotMatchWrappedInA">wordToMatch</a> some 
random text to not be matched followed by wordToMatch followed by 
further occurrences of wordToMatch which should not be matched.</p>

Expected Result 3:

wordToMatch ("The second occurrence of the term")

Please help :'(

Language being used is Java

  • 2
    [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – ctwheels Jan 08 '18 at 18:13
  • Use some `xpath` queries instead - what is your programming language? – Jan Jan 08 '18 at 19:08
  • Java but processing the response of an ingested JCR query. XPATH may be an option. – CoDemystified JavaFx Jan 09 '18 at 08:20

1 Answers1

0

The simple pattern i can think of is:

(?:<a.*>)(\w+)(?:<\/a>)

In order to test please run the perl script:

$result  = "<p>Sample 1 <a href=\"shouldNotMatchWrappedInA\">wordToMatch</a> some random text to not be matched followed by <b>wordToMatch</b>, this should work.</p>";

$result  =~  m/(?:<a.*>)(\w+)(?:<\/a>).*(\1).*/;

print $2; 

Pay attention you need to use the second matched group. Unfortunately i cannot give you answer in JAVA.

volkinc
  • 2,143
  • 1
  • 15
  • 19