0

I am looking for a way to replace all the words in an HTML string in order to wrap them with tag. I have tried splitting by empty space then iterating through the words and replacing, but the problem is that some words do not start or end with empty space (ex. new paragraph). Maybe there is some kind of Regex that can help or other creative method?

For example let's use the html string:

<h1>Lorem ipsum dolor sit amet</h1>
<p>consectetur adipisicing elit</p>
<p>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>
<p>Ut enim ad minim veniam</p>

Here is the code I have so far which is not working good enough:

var html = $("#text").html();
var text = $("#text").text();
var words = text.split(' ');
for (var i = 0; i < words.length; i++) {
    html = html.replace(words[i], '<span style="color: red;">' + words[i] +'</span>');
}
$("#text").html(html);

The jsfiddle: http://jsfiddle.net/nd6a3/3/

Light
  • 1,647
  • 3
  • 22
  • 39
  • so in the example text what is the expected output? – Liam Oct 16 '13 at 10:47
  • 3
    You've given us a "before changes" string. Can you show us what you want the "after changes" string to look like? – h2ooooooo Oct 16 '13 at 10:47
  • The output should be that each word will be wrapped with . I don't want to write the full output in my question because the text will be too long :) – Light Oct 16 '13 at 10:50
  • 1
    You appear to be wrapping each word with a span. Why not just wrap the whole sentence instead? – Andy Oct 16 '13 at 10:50
  • @Andy I am going to use it for more complicated task after the wrapping (getting the position of each word) – Light Oct 16 '13 at 10:55

4 Answers4

2
var text = "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";
var words = text.match(/\w+/g);
// Or test.match(/\b([^\s]+?)\b/g) to support any non standard characters.

words contains an array of all the words in the string text.

["sed", "do", "eiusmod", "tempor", "incididunt", "ut", "labore", "et", "dolore", "magna", "aliqua"]

From there on you can use your loop to replace the words.

Broxzier
  • 2,909
  • 17
  • 36
  • Thanks, my text will contain also special characters like Hebrew, Arabic, Latin, etc. It seems the this regex doesn't work on that. Is it possible write that regex so it will take those characters as well? – Light Oct 16 '13 at 10:58
  • @Light You could check for the word borders using `/\b([^\s]+?)\b/g`. – Broxzier Oct 16 '13 at 12:04
2

It's better to use a structured approach when working with html. Plain regexes are too dumb for that.

$("#text *").contents().filter(function() {
    return this.nodeType == 3
}).replaceWith(function() {
    return this.nodeValue.replace(/\b(\w+)\b/g, "<u>$1</u>")
});

http://jsfiddle.net/XhwMY/

Regarding your comment about finding words in Hebrew, Arabic etc, - javascript doesn't support that: \w+ only works for latin letters. The only workaround is to use explicit unicode character ranges. For example, for Hebrew, the expression will be like this:

this.nodeValue.replace(/[\w\u0590-\u05FF]+/g, "<u>$&</u>")

This tool will help you to find the ranges you need.

georg
  • 211,518
  • 52
  • 313
  • 390
  • Thanks! How can I update the regex so it will also match non-english characters like Hebrew, Russian, Latin, etc? – Light Oct 16 '13 at 11:16
  • I am not the best with regex, can will please write the full replace method including the Hebrew characters? – Light Oct 16 '13 at 11:24
  • there is only one problem now. If the word has punctuation marks like '.' or '...', the regex doesn't include these characters in the word. Can you please update the regex so it will include punctuation marks in the word? I have updated your JSfiddle for example. – Light Oct 16 '13 at 14:30
1

You can try with the following regex:

$("#text").html(function(i, oldHtml) {
    return oldHtml.replace(/([^ ]+)(?![^>]>)/gi, "<span style='color: red;'>$1</span>");
});

Here's a fiddle for you: http://jsfiddle.net/xbcLt/1/

EDIT:
As you can see in the above code, everything can be wrapped with one handler function as a jQuery.html parameter. I also updated link to the fiddle, to match the updated code.

matewka
  • 9,912
  • 2
  • 32
  • 43
0

Simply replace /\w+/g with <span style="color: red">\1</span> like so:

var str = 'Lorem ipsum dolor sit amet\n' +
'consectetur adipisicing elit\n' +
'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n' +
'Ut enim ad minim veniam\n';

str = str.replace(/\w+/g, function(match) { return '<span style="color: red;">' + match + '</span>' });

Which will result in the following output:

<span style="color: red;">Lorem</span> <span style="color: red;">ipsum</span> <span style="color: red;">dolor</span> <span style="color: red;">sit</span> <span style="color: red;">amet</span>
<span style="color: red;">consectetur</span> <span style="color: red;">adipisicing</span> <span style="color: red;">elit</span>
<span style="color: red;">sed</span> <span style="color: red;">do</span> <span style="color: red;">eiusmod</span> <span style="color: red;">tempor</span> <span style="color: red;">incididunt</span> <span style="color: red;">ut</span> <span style="color: red;">labore</span> <span style="color: red;">et</span> <span style="color: red;">dolore</span> <span style="color: red;">magna</span> <span style="color: red;">aliqua</span>.
<span style="color: red;">Ut</span> <span style="color: red;">enim</span> <span style="color: red;">ad</span> <span style="color: red;">minim</span> <span style="color: red;">veniam</span>

Note: This will only work with text. If you use this on HTML it will also turn <h1> into <<span style="color: red;">h1</span>>.

h2ooooooo
  • 39,111
  • 8
  • 68
  • 102