0

I have very simple html that is generated from a jSon database of strings like this:

"<div style=\"padding-top:59px;\"><a href=\"http://www.macm.org/en/index.html\"><img src=\"http://www.artimap.com/montreal/www.macm.org.jpg\"><br>www.macm.org/en/index.html</a><h1>Musée d'art contemporain de Montréal</h1><p></p><p>A major Canadian institution dedicated exclusively to contemporary art, the Musée offers a varied program ranging from presentations of its Permanent Collection to exhibitions of works by Québec, Canadian and international artists. The Permanent Collection comprises some 7,000 works, including the largest collection of art by Paul-Émile Borduas.</p><div><p>185, Sainte-Catherine West (corner Jeanne-Mance)</p><p>H2X 3X5</p></div><b>514 847-6226</b></div>"

And a variable RESULTSshow that is a concatenation of such strings, an another var: searchterm that is the search term. I want to enclose each occurence of searchterm in the results by the HTMl <i>searchterm</i> I am using those regexp and function for each tags I am intereseted in, for example:

var REG=new RegExp(searchterm,'gmi');
var regFUN=function(x){return x.replace(REG,"<i>$&</i>");};
var reg = new RegExp('<p>(.*?)</p>','gmi');
RESULTSshow=RESULTSshow.replace(reg,regFUN);
(I do this for every tags I am interested in highlighting) 
This does <i>"searchterm"</i> but also gives <<i>p</i>> if searchterm==="p" wich really bugs me for the two last days.

The problem is that if searchterm is "p", that will not only change the text inside the tags but also change the tag itself.

How can I stop it from changing the tags ? I really want to do it with a regExp, not looping through html (dom) for speed sake.

  • 5
    Parsing HTML with regex? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – PleaseStand Nov 21 '10 at 06:39
  • You better edit the question title... I would not call this "simple". – Shadow The GPT Wizard Nov 21 '10 at 08:40
  • 2
    You can't tell a "p" in normal text from a "p" inside a tag without parsing the DOM. Definitely not reliably with regex, especially with a regex flavor that doesn't even support lookbehind assertions. – Tim Pietzcker Nov 21 '10 at 08:48
  • idealmachine, it's not like parsing any html, it is for parsing very simple tags: h1, p and b tags, all of them without any attributes or else. A perfect job for regExp it seems to me. – Sylvain Picker Nov 22 '10 at 19:15
  • Shadow Wizard you are right, I edited the title – Sylvain Picker Nov 22 '10 at 19:18

2 Answers2

1

Now using this wonderful little RegExp instead of the overly complicated first one:

REG=new RegExp("(?![^<>]*>)("+searchterm+")","gi");
RESULTSshow=RESULTSshow.replace(REG,'<i>$1</i>');
BenMorel
  • 34,448
  • 50
  • 182
  • 322
0

Well, considering your HTML doesn't contain blocks like SCRIPT, CDATA, STYLE, it's possible with a regex using lookahead :

text = text.replace(/(?![^<>]*>)old/g, 'new');

Though I'd use a light parser or a home-grown one without worrying about the speed for better support. Note that you'll need to process the source if your attributes may contain <> characters.

Try this :

<html>
<head>
<script>
function t() {
    text = "<html><head></head><body><p>SuperDuck</p><p>Jumps over the lazy dog</p></body></html>";
    a = text.replace(/(?![^<>]*>)(p)/g, '<i>$1</i>');
    alert (a);
}
</script>
</head>
<body>
    <button onclick="t();">hit me!</button>
</body>
</html>

Just replace the (p) in the replace string and you're ready to jump over =)

DarkWingDuck
  • 2,028
  • 1
  • 22
  • 30
  • I need to find some "searchTerm" inside the text and then enclose it in i tags. – Sylvain Picker Nov 22 '10 at 19:17
  • The REGEX I've sent definetely does that. The part in the parantheses `(p)` is the searchTerm, and the `$1` is how it is wrapped with `i` tags, `$1` is being the search term in the replaced version. Just run the code to see the search term 'p' is replaced with `p`, but not the `

    ` tags.

    – DarkWingDuck Nov 22 '10 at 23:02
  • Yep it seems to work, I tried That:
    – Sylvain Picker Nov 23 '10 at 01:00
  • Thanks Sylvain, a step in the right direction, or just works? =) I thought you were after the regex formula? If so please don't hesitate marking it as the accepted answer by clicking on the check box outline to the left of the answer. This lets other people know that you have received a good answer to your question. Doing this is helpful because it shows other people that you're getting value from the community. – DarkWingDuck Nov 23 '10 at 04:26