0

I'm getting every text node on the page that contains "em"

I'm then replacing the html (i.e. the text) of the parent of the text node with new HTML.

Try the below code on your console. It should highlight all "em" text on the current page.

But it doesn't. It messes up the whole page instead. What am I doing wrong?

I thought :contains looks for text nodes, not HTML.

var someString = "em";
$(':contains("' + someString + '")').each(function ()
{
  var oldHTML = $(this).html();
  var regex = new RegExp(someString, "g");
  var newHTML = oldHTML.replace(regex, '<span style="background-color: yellow;">' + someString + '</span>');
  $(this).html(newHTML);
});

I'm sorry I couldn't find a more intuitive title for my weird problem - please feel free to change it if you can! :)

Thanks!

EDIT :

It turns out my .replace(...) is also replacing html tags, not just plain text.

I should've been doing something like (although I'm not sure it's the right regex) :

var regex = new RegExp("<.*>(" + someString + ")\/.*>", "g");
var newHTML = oldHTML.replace(regex, '<span style="background-color: yellow;">' + someString + '</span>');
Kawd
  • 4,122
  • 10
  • 37
  • 68
  • Seems to work for me. http://jsfiddle.net/zUMB9/. What is your html like? – ced-b Feb 20 '14 at 00:41
  • jsfiddile is not reliable because it's never under "real conditions" - Try it on the page we're on right now. Open up the browser's console and run the code. – Kawd Feb 20 '14 at 00:42
  • 1
    It's because you aren't looking for just plain text. You don't want the html elements. – ps2goat Feb 20 '14 at 00:45
  • Oh I see - my replace function also wraps html tags not just plain text - damn You're right! thanks! :) It'd be cool if I could come up with a regex that will only replace plain text.. let's see.. – Kawd Feb 20 '14 at 00:48
  • Using regexp for this is really pretty ugly. You have a full DOM API at your disposal that'll allow you to directly target text nodes. Why not make use of it. The DOM isn't a string after all. It's a hierarchy of objects. Your `:contains` selection is going to give you every ancestor of every occurrence of the text string, including the `body` and `documentElement`. – cookie monster Feb 20 '14 at 00:57

1 Answers1

1

Adding my two cents as an official answer: It's because you aren't looking for just plain text. You don't want the html elements.

Check here for some more gotchas: regular expression to extract text from HTML

You may be better off doing the highlighting on the server, where you can easily tell what is text.

Community
  • 1
  • 1
ps2goat
  • 8,067
  • 1
  • 35
  • 68
  • You're right - Would something similar (i.e. correct) to the RegEx I'm attempting in my EDIT solve the problem ? – Kawd Feb 20 '14 at 00:56
  • As stated in the other post, browsers allow inferior html so it would be nigh impossible to catch everything with a regex. However, if you are 100% sure you and everyone on your team can create perfect code, you may be able to swing it. I guarantee nothing. =) – ps2goat Feb 20 '14 at 01:02