-1

Suppose we have a string <span id='make-this-bold'>A giant pan is still a pan.</span>

I want to match the string pan inside the content of the tag but not in the tag. In this example, the desired outcome would pan and pan from 'A giant pan is still a pan'. My initial thought was to use negative look ahead or behind, but I'll have to write different regex if I want to match other strings, like is or old.

To summarize, I'm trying to write a regex that looks for a string, like pan, outside a specific pattern, <span id='make-this-bold'>. How can I easily accomplish this?

Edit on Nov. 19th 2019

Thanks for reading and answering this. To clarify, the object we are dealing with is a string, not HTML elements as there are no HTML DOM elements that contain this text or any make-this-bold tag at this point. The original goal is to inject one specific tag around certain words in a plain text so they can be rendered bold on the web page. In the current script it is treated as a string operation, which uses a for loop to go through a list of words that needs to be bold, search for each word in the string, and if found, inject the tag around it.

Therefore, as the question is framed, it is not to parse HTML but to parse a string, and the answer is not in this question - RegEx match open tags except XHTML self-contained tagsRegEx match open tags except XHTML self-contained tags.

Update

As a workaround, I'm simply replacing <span id='make-this-bold'> and </span> with a string constant before injection and revert them after injection. Still hope there's an elegant regex solution to do look for pattern A but ignore it if pattern A is in pattern B.

joebeav
  • 1
  • 2
  • 3
    Do not use a reg exp on HTML. https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/ There is always a better solution. – epascarello Nov 18 '19 at 20:07
  • HTML is not a regular language and cannot be correctly parsed with regular expressions. You may want to use an HTML tree parser such as the lxml library in python instead. – deterjan Nov 18 '19 at 20:09
  • to achieve your goal, **it is mandatory to do a syntactic analysis of the html before** you can do your search on sequence of characters. – Mister Jojo Nov 19 '19 at 18:33

2 Answers2

0

As described on the comment lines "Do not use a reg exp on HTML" for the reason described. But if you still have to do it that way, you could try the following with one restriction: No nested tags.

console.log(str.match(/>(.*)</g)[0].match(/pan/g));
Addis
  • 2,480
  • 2
  • 13
  • 21
0

With javascript you can analyze tag inside properties without take in count name of the properties. An example:

      var str = document.getElementById("make-this-bold").outerHTML;
      console.log(str);
      var res = str.match('pan');
      console.log(res);
      
<span id='make-this-bold'>A giant pan is still a pan.</span>
alarti
  • 32
  • 1