0

I've been working on a highlight script. The first result can be found here substring selector with jquery?

The script http://jsfiddle.net/TPg9p/3/

But unfortunately it only works with a simple string. I want it to work with string that contain tags inside.

Example :

<li>sample string li span style="color:red" id 
    <span id="toto" style="color:red">color id</span> 
    abcde
</li>

So if the user search for span it should only match the span inside the <li> and before the tag span but not the tag span itself. Then the matched string is replace with <span class="highlight">span</span>The same for other attributes or content of an attributes. Anything inside an opening tag and end tag should be ignored.

Since HTML is about DOM and nodes. Could we parse this string into nodes then select only the text node to replace it?

Please answer by updating the jsFiddle above.

UPDATED

Demo of working solution by Tibos : http://jsfiddle.net/TPg9p/10/

Community
  • 1
  • 1
Thanh Trung
  • 3,566
  • 3
  • 31
  • 42

2 Answers2

1

Disclaimer: You should use a HTML parser instead of regexp here.

The regular expression you are looking for is this one:

/span(?=[^>]*<)/

Example usage:

var str = '<li>sample string li span style="color:red" id ' + 
    '<span id="toto" style="color:red">color id</span> ' +
    'abcde' +
    '</li>';
var keyword = 'span';
var regexp = new RegExp(keyword + '(?=[^>]*<)');
str.replace(regexp, '<span class="highlight">$&</span>');

The regexp matches your word when it is followed by a < before a >.

EDIT: Seeing how you don't have valid HTML (doesn't start with a tag, end with a tag), you can change your regular expression to also check for the end of the string rather than the begining of a tag:

/span(?=[^>]*(?:<|$))/

DEMO: http://jsfiddle.net/TPg9p/8/

EDIT: Added regexp escaping: .replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') Curtesy of this answer: Is there a RegExp.escape function in Javascript?

Community
  • 1
  • 1
Tibos
  • 27,507
  • 4
  • 50
  • 64
  • It doesn't work, please test it with color, id, style inside the span http://jsfiddle.net/TPg9p/6/ – Thanh Trung Nov 25 '13 at 15:45
  • Works perfectly if you actually use the global modifier of the regexp class. http://jsfiddle.net/TPg9p/7/ – Tibos Nov 25 '13 at 15:52
  • Your version works great and is much easier to understand than raina's. It also works recursively for all descendants. Can I ask you to ask an escape for regexp ? String such as ^()\w\d$ inside that HTML. Then I can give you credit. Also it will appear weird if the user search for `<` or `>` – Thanh Trung Nov 25 '13 at 16:04
  • 1
    Done. `<` and `>` are not valid characters inside your search area. – Tibos Nov 25 '13 at 16:25
  • Thanks for your help. But still the `>` still causes some troubles http://jsfiddle.net/TPg9p/9/ I'll stick with raina77ow's solution since there's a nodeType check seems much safer. – Thanh Trung Nov 25 '13 at 22:33
  • I tried to search for multiple string such as `id color` it doesn't work. Simply because they belong to different nodes – Thanh Trung Nov 27 '13 at 11:04
  • I'm not sure i understand the problem. In the demo after typing id in the input field and clicking highlight colored both occurances. There are no occurances of "id color", but if you type "color id", it highlights the occurance correctly. If you want the search string to be split on the different words, then i'm afraid i won't help you because it's way beyond the scope of the original question. – Tibos Nov 27 '13 at 11:12
  • `id color` are 2 words right after `color:red`. `id` belong to the node `

    ` while `color` belong to the node ``. It's true that the original question asked to search for a single word. But it's just an example. Imagine the case the user search for the whole sentence, then it should highlight the whole sentence... So if I search for `id color` then it should match that couple of words and not `id` and `color` seperatedly. Your version works a bit better than raina77ow's because it can match text which belong to different text nodes.

    – Thanh Trung Nov 27 '13 at 13:34
  • 1
    Indeed. I solved the problem here http://jsfiddle.net/TPg9p/10/ by inserting `(?:<.*?>)?` (possibily any tag) between all the letters in the search string. I didn't make it work with the escaping part, that's for you to do. This is starting to feel a lot less like "i need some help with this" and more like "here are the specs, do it!" so i will no longer attempt to resolve any issues you may have. – Tibos Nov 27 '13 at 13:45
  • Thank for your big help. It works now! Sorry if I was being rude or anything. I was just trying to point out if the script is fully working or not. – Thanh Trung Nov 27 '13 at 14:23
  • 1
    All good. I was just pointing out that SO is thought of more like offering a push in the right direction than giving a working solution and it seemed like the effort you put into trying to solve the other problems yourself was less than adequate. In any case, i'm glad to have been of some help. – Tibos Nov 27 '13 at 14:55
1

Instead of attempting to get the correct string with the regexes, work with textNodes only:

$('#submit').click(function () {
    var replacePattern = new RegExp(
        $('#search').val().replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), 
        'gi');
    $('#sample').children().addBack().not('.highlight')
      .contents().filter(function() {
        return this.nodeType === 3;
    }).replaceWith(function(){
        return this.data.replace(replacePattern, '<b class="highlight">$&</b>');
    });
});

Demo.

Explanation: first you collect the #sample element and its descendants (direct only, if children() is used; it's possible to use find(*) as well, of course). Then .highlight elements are filtered out of that selection - it's obviously optional, but it made little sense for me to highlight within something that's already highlighted.

After you have all the elements (to be processed), you collect all their children with .contents() - and filter the collection (with nodeType check) so that only text nodes remain there. Finally, you run .replaceWith() over that collection.

Note that the pattern definition is placed outside of the replaceWith callback function (as it basically should be a constant value during a single click handling).

raina77ow
  • 103,633
  • 15
  • 192
  • 229
  • I've tested your program. But it only match whatever is outside the `span`. http://jsfiddle.net/5J4Gn/1/ Example `toto` isn't matched. Though partially worked but it's not what I expected – Thanh Trung Nov 25 '13 at 14:15
  • Updated the answer, adding the 'children' selector. – raina77ow Nov 25 '13 at 14:32
  • Worked! You're a genius. Even though I dont' understand what you are writting :P – Thanh Trung Nov 25 '13 at 15:48
  • I added .find('*') for recursive search – Thanh Trung Nov 25 '13 at 16:08
  • 1
    And I've added an explanation. ) Of course, it's possible to adjust this sample; the key point here is how internal HTML parser could (and, in my opinion, should) be used in this case. – raina77ow Nov 25 '13 at 16:30
  • Thanks for clearing that up. It's easier to understand now :) – Thanh Trung Nov 25 '13 at 22:34
  • Hi again. There's a bug. If you search for the word `am`. Revert it. Then search again for `ampl`. The script stop working. It's because when you revert it, the text node is split into 2 text nodes and the comparison stops working – Thanh Trung Nov 26 '13 at 11:54