7

If i had a string:

hey <a href="#user">user</a>, what are you doing?

How, with regex could I say: look for user, but not inside of < or > characters? So the match would grab the user between the <a></a> but not the one inside of the href

I'd like this to work for any tag, so it wont matter what tags.

== Update ==

Why i can't use .text() or innerText is because this is being used to highlight results much like the native cmd/ctrl+f functionality in browsers and I dont want to lose formatting. For example, if i search for strong here:

Some <strong>strong</strong> text.

If i use .text() itll return "Some strong text" and then I'll wrap strong with a <span> which has a class for styling, but now when I go back and try to insert this into the DOM it'll be missing the <strong> tags.

Oscar Godson
  • 31,662
  • 41
  • 121
  • 201
  • interesting. what are you doing it for? – Benny Tjia Jun 23 '11 at 06:55
  • How are you getting this text? innerHTML? You could try simply getting the text. – kapa Jun 23 '11 at 06:55
  • @Benny for a sort of JS search. I want to search what is visible to the user with like `.highlight('user')` – Oscar Godson Jun 23 '11 at 06:56
  • @bazmegakapa It's going to be a jQuery plugin, but i'd like to know the regex when I import this same concept into a JS library. It's using `$('someelement').html()` in getting the HTML – Oscar Godson Jun 23 '11 at 06:57
  • 1
    @Oscar Great, then use `.text()` and the big problem is solved. Parsing HTML with regex only promises problems for you. – kapa Jun 23 '11 at 06:59
  • @OscarGodson: Are you also setting the HTML again this way? You will loose event handlers bound to elements if you do it that way. – Felix Kling Jun 23 '11 at 07:05
  • @Bazmegakapa see my updated post, that wont work :( – Oscar Godson Jun 23 '11 at 07:12
  • @Felix crap... forgot about that, any ideas? Maybe I need to absolute position these highlights instead? – Oscar Godson Jun 23 '11 at 07:13
  • Oh... I think I might have missed some point. Do you actually want to search for some text inside a **string** or in the DOM? I assumed the latter. – Felix Kling Jun 23 '11 at 08:27
  • Nope http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Sam Greenhalgh Jun 27 '13 at 13:27
  • @SamGreenhalgh If you have a solution let me know. This is 2 years old, but I'm sure others could benefit from any solution, regex or not, rather than that thread. – Oscar Godson Jun 27 '13 at 21:36
  • @OscarGodson I'm afraid, two years on, that answer is still quite relevant, as the the 4458 upvotes would suggest. This might explain more http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html . I'd consider using the DOM to find nodes that match what you're looking for and then wrap them in an element using DOM manipulation methods. – Sam Greenhalgh Jun 28 '13 at 08:39

7 Answers7

8

If you plan to replace the HTML using html() again then you will loose all event handlers that might be bound to inner elements and their data (as I said in my comment).

Whenever you set the content of an element as HTML string, you are creating new elements.

It might be better to recursively apply this function to every text node only. Something like:

$.fn.highlight = function(word) {
    var pattern = new RegExp(word, 'g'),
        repl = '<span class="high">' + word + '</span>';

    this.each(function() {
        $(this).contents().each(function() {
            if(this.nodeType === 3 && pattern.test(this.nodeValue)) {
                $(this).replaceWith(this.nodeValue.replace(pattern, repl));
            }
            else if(!$(this).hasClass('high')) {
                $(this).highlight(word);
            }
        });
    });
    return this;
};

DEMO

It could very well be that this is not very efficient though.

Felix Kling
  • 795,719
  • 175
  • 1,089
  • 1,143
  • Do you think searching for the word, wrapping it in a "temp" span, getting the x,y of that span, then creating an element on top of that would be more efficient? – Oscar Godson Jun 23 '11 at 07:42
  • @OscarGodson: No, because you are effectively doing the same (wrapping the word in some tag) *plus* some extra work. Inspecting every text node (recursively) is what takes time. Depending on the selector it can also be that you visit nodes several times. I suggest you test it in various scenarios and try to improve upon it. – Felix Kling Jun 23 '11 at 07:43
2

To emulate Ctrl-F (which I assume is what you're doing), you can use window.find for Firefox, Chrome, and Safari and TextRange.findText for IE.

You should use a feature detect to choose which method you use:

function highlightText(str) {
    if (window.find)
        window.find(str);
    else if (window.TextRange && window.TextRange.prototype.findText) {
        var bodyRange = document.body.createTextRange();
        bodyRange.findText(str);
        bodyRange.select();
    }
}

Then, after you the text is selected, you can style the selection with CSS using the ::selection selector.

Edit: To search within a certain DOM object, you could use a roundabout method: use window.find and see whether the selection is in a certain element. (Perhaps say s = window.getSelection().anchorNode and compare s.parentNode == obj, s.parentNode.parentNode == obj, etc.). If it's not in the correct element, repeat the process. IE is a lot easier: instead of document.body.createTextRange(), you can use obj.createTextRange().

Casey Chu
  • 25,069
  • 10
  • 40
  • 59
1
$("body > *").each(function (index, element) {

  var parts = $(element).text().split("needle");
  if (parts.length > 1)
    $(element).html(parts.join('<span class="highlight">needle</span>'));
});

jsbin demo

at this point it's evolving to be more and more like Felix's, so I think he's got the winner


original:

If you're doing this in javascript, you already have a handy parsed version of the web page in the DOM.

// gives "user"
alert(document.getElementById('user').innerHTML);

or with jQuery you can do lots of nice shortcuts:

alert($('#user').html()); // same as above

$("a").each(function (index, element) {
    alert(element.innerHTML); // shows label text of every link in page
});
Brad Mace
  • 27,194
  • 17
  • 102
  • 148
0

Try this:

/[(<.+>)(^<)]*user[(^>)(<.*>)]/

It means:

Before the keyword, you can have as many <...> or non-<.

Samewise after it.

EDIT:

The correct one would be:

/((<.+>)|(^<))*user((^>)|(<.*>))*/
SteeveDroz
  • 6,006
  • 6
  • 33
  • 65
  • Hmm, seems to break it. Look at the generated HTML: http://jsbin.com/ayati5/3/edit – Oscar Godson Jun 23 '11 at 07:28
  • Cool, think its working, but how can I get that in a `.replace()` with `$N` so I could do something like: `str.replace(/((<.+>)|(^<))*user((^>)|(<.*>))*/g,'$1')` – Oscar Godson Jun 23 '11 at 07:51
0

I like regexes, but because tags can be nested, you will have to use a parser. I recommend http://simplehtmldom.sourceforge.net/ it is really powerful and easy to use. If you have wellformed xhtml you can also use SimpleXML from php.

edit: Didn't see the javascript tag.

Leif
  • 2,143
  • 2
  • 15
  • 26
  • Doesnt matter if tags are nested.
    a
    b
    c
    wouldn't matter because proper regex would simply look for any string thats NOT between <*>
    – Oscar Godson Jun 23 '11 at 07:46
  • @Oscar - and what if the markup is
    my text
    ?
    – Alohci Jun 23 '11 at 08:36
  • @Oscar: This could result in "a
    b
    c
    ". Or look into that and you get "b
    c
    ". I just thought that was not what you wanted. @Alohci: You have to use entities in this case. e.g < instead of <.
    – Leif Jun 23 '11 at 09:32
  • @Alochi I don't expect my script, nor would I expect anyone else's scripts to understand malformed HTML. In the 1 in a million edge cases a user tries to name a class an invalid class name I don't expect my script to work. – Oscar Godson Jun 23 '11 at 18:24
  • that'd be an issue if I wasn't looking globally i think, but basically each time I see `<*>` remove it from the search (not the DOM, just the search) – Oscar Godson Jun 23 '11 at 18:25
0

Here is what works, I tried it on your JS Bin:

var s = 'hey <a href="#user">user</a>, what are you doing?';
s = s.replace(/(<[^>]*)user([^<]>)/g,'$1NEVER_WRITE_THAT_ANYWHERE_ELSE$2');
s = s.replace(/user/g,'Mr Smith');
s = s.replace(/NEVER_WRITE_THAT_ANYWHERE_ELSE/g,'user');
document.body.innerHTML = s;

It may be a tiny little bit complicated, but it works!

Explanation:

  • You replace "user" that is in the tag (which is easy to find) with a random string of your choice that you must never use again... ever. A good use would be to replace it with its hashcode (md5, sha-1, ...)
  • Replace every remaining occurence of "user" with the text you want.
  • Replace back your unique string with "user".
SteeveDroz
  • 6,006
  • 6
  • 33
  • 65
-1

this code will strip all tags from sting

var s = 'hey <a href="#user">user</a>, what are you doing?';
s = s.replace(/<[^<>]+>/g,'');
Dim_K
  • 571
  • 2
  • 15
  • 3
    [No. It's wont.](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege) – Brad Mace Jun 23 '11 at 06:56
  • @Dim_K : you did. In your example, the value of `s` will be `'hey user, what are you doing?`. – SteeveDroz Jun 23 '11 at 07:01
  • @Dim - that link is a whole compilation of examples – Brad Mace Jun 23 '11 at 07:01
  • @Oltarus yes it is right. Regexp save only visible text. What is wrong? – Dim_K Jun 23 '11 at 07:02
  • @Dim_K you didn't isolate `user`... You stripped HTML. – SteeveDroz Jun 23 '11 at 07:06
  • @Oltarus title is about visible text. i think `user` just explanation – Dim_K Jun 23 '11 at 07:12
  • @Dim_K Nope, I think @Oscar Godson is looking for some precise text which is in the string but **not** in the HTML. not for "Anything that is non-HTML". – SteeveDroz Jun 23 '11 at 07:14
  • Sorry for the confusion. I don't want to strip HTML, i want a result from a string, but don't search HTML tags. So, with your current code, if you had the right regex, it'd return `"hey , what are you doing?"` – Oscar Godson Jun 23 '11 at 07:21
  • But ill replace it with like "`$1`" which would make the resulting HTML `hey user, what are you doing?` -- Does that make more sense? – Oscar Godson Jun 23 '11 at 07:23