0

A Perl CGI application is providing a search function. The application writes matching snippets to the HTML page. Now I would like to highlight the matches inside the snippets. I could use something like

s/($searchregex)/<span class="highlight">$1<\/span>/gi

to highlight the matches. This is working fine for text only cases, but breaks sometimes with snippets containing itself HTML tag, e.g. for links or images with references. In failing cases the above replacement is destroying the HTML links by inserting the span tag inside the href value.

At the moment I am seeing three possible solutions:

  1. Write a regex that is not replacing matches inside of html tags, e.g. inside <>. I am not aware how to write a replacement regex for this case. Is there a perl regex to allow this replacement and how does it look like?

  2. Write a regex that replaces all wrong replacements of the above replacement. This would fix the wrong span tags inside the href.

  3. Use Javascript to highlight the matches inside the resulting DOM tree. Possible ways using jQuery are outlined in highlight html with matching text. Even normal Javascript may be enough JavaScript’s Regular Expression Flavor. There are special jQuery plugins for highlighting highlight regular expressions , too. I am new to Javascript so some more advise is appreciated, too.

What is the preferable solution? The best way would to it as 1. - but that seems not possible. So the remaining question is: Do the work in an ugly way on the server side or introduce Javascript to solve the problem in a cleaner way on the client side.

Community
  • 1
  • 1
Christian
  • 1,017
  • 3
  • 14
  • 30
  • 1
    You could use an HTML parser on the server side, which is the correct tool for the job you are doing, or do it with javascript as you say, which I prefer myself as it is more versatile, and could lead to more interactivity. – Billy Moon Aug 07 '12 at 13:11
  • @BillyMoon: For me an HTML parser is not worth to load for this job. It seams to heavy. But could you please add your comment as an answer? – Christian Aug 07 '12 at 13:25

2 Answers2

1

You could use an HTML parser on the server side, which is the correct tool for the job you are doing.

Or you could do it with javascript as you say, which I prefer myself as it is more versatile, and could lead to more interactivity, although you would probably be facing a similar issue to what you are facing now (just that you have moved it to the client side).

It is actually a more complex question than it first appears. Without more information, it is impossible to try to come up with a better solution.

One good solution would be to traverse the DOM tree and match against each text node, but you have a problem then that you would not match text that spans several text nodes - for example "John the Con Johnson" would not match the search for "John the Con" as they would be in separate nodes. This might or might not be a problem for you, depending on your use case.

Billy Moon
  • 57,113
  • 24
  • 136
  • 237
1

in perl with a lookahead after pattern

s/($searchregex)(?=[^>]*<)/<span class="highlight">$1<\/span>/gi

or shorter

s/$searchregex(?=[^>]*<)/<span class="highlight">$&<\/span>/gi

but maybe you will need to read the whole file in a string or change the input record separator ($/) to '<', because the regexp matches the pattern if it's followed by a sequence of any character except '>' and by '<' because will not match if ($/="\n" and there is a newline between pattern and next '<'.

Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36