Regexp to wrap each word on HTML page

Question

Is it possible to wrap each word on HTML page with span element? I'm trying something like

/(\s*(?:<\/?\w+[^>]*>)|(\b\w+\b))/g

but results far from what I need.

Thanks in advance!

You really [shouldn't parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454) — Joseph Marikle, Aug 21 '11 at 21:28
You can't parse HTML with regex, only Chuck Norris can. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — stewe, Aug 21 '11 at 21:29
Of course [you can use regexes to parse HTML](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491). In fact, some times you even *should*. However, Javascript has some of the **most horrible regexes** of any programming language anywhere. The `XRegExp` plugin helps, but it still sucks. It's easier to teach a pig to sing, and less annoying.Either do all Real™ work serverside where you can use a Real™ programming language, or else be prepared to improvise a 6-voice fugue for unaccompanied porcine chorus. — tchrist, Aug 22 '11 at 00:16
Thanks guys, it seems I need to look in a direction of getting all text nodes and working with them. — Roman, Aug 22 '11 at 07:45

jAndy · Accepted Answer · 2011-08-22T08:08:38.887

Well, I don't ask for the reason, you could do it like this:

function getChilds( nodes ) {
    var len = nodes.length;

    while( len-- ) {
        if( nodes[len].childNodes && nodes[len].childNodes.length ) {
            getChilds( nodes[len].childNodes );
        }

        var content = nodes[len].textContent || nodes[len].text;

        if( nodes[len].nodeType === 3 ) {
            var parent = nodes[len].parentNode,
                newstr = content.split(/\s+/).forEach(function( word ) {
                    var s = document.createElement('span');
                    s.textContent = word + ' ';

                    parent.appendChild(s);
                });

            parent.removeChild( nodes[len] );
        }
    };
}

getChilds( document.body.childNodes );

Even tho I have to admit I didn't test the code yet. That was just the first thing which came to my mind. Might be buggy or screw up completely, but for that case I know the gentle and kind stackoverflow community will kick my ass and downvote like hell :-p

Why this line: `var each = Array.prototype.forEach;`? there doesn't seem to be a point to it. — Brock Adams, Aug 22 '11 at 02:05
Yeah, first line is confusing, could you explain this? Anyway, with some modification this solved my problem. Thanks! — Roman, Aug 22 '11 at 07:54
@Brock: yay you're right. Thats a hangover from a further version. I'll remove it. — jAndy, Aug 22 '11 at 08:08

score 2 · Answer 2 · edited May 23 '17 at 12:19

2

You're going to have to get down to the "Text" nodes to make this happen. Without making it specific to a tag, you really to to traverse every element on the page, wrap it, and re-append it.

With that said, try something like what a garble post makes use of (less making fitlers for words with 4+ characters and mixing the letters up).

edited May 23 '17 at 12:19

Community

1
1

answered Aug 21 '11 at 22:11

Brad Christie

100,477
16
156
200

1

That was a fun topic, wasn't it? – qwertymk Aug 21 '11 at 22:28

score 1 · Answer 3 · answered Aug 21 '11 at 22:17

1

To get all words between span tags from current page, you can use:

var spans = document.body.getElementsByTagName('span');
if (spans)
{
  for (var i in spans)
  {
    if (spans[i].innerHTML && !/[^\w*]/.test(spans[i].innerHTML))
    {
      alert(spans[i].innerHTML);
    }
  }
}
else
{
  alert('span tags not found');
}

answered Aug 21 '11 at 22:17

Victor

5,493
1
27
28

1

My understanding is not to filter based on if they're already in a span, but to make every word itself get wrapped in a new span. ...maybe I'm misinterpreting? – Brad Christie Aug 21 '11 at 22:22

score 1 · Answer 4 · answered Aug 21 '11 at 22:50

You should probably start off by getting all the text nodes in the document, and working with their contents instead of on the HTML as a plain string. It really depends on the language you're working with, but you could usually use a simple XPath like //text() to do that.

In JavaScript, that would be document.evaluate('//text()', document.body, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null), than iterating over the results and working with each text node separately.

score 1 · Answer 5 · answered Aug 22 '11 at 05:39

See demo

Here's how I did it, may need some tweaking...

var wrapWords = function(el) {
    var skipTags = { style: true, script: true, iframe: true, a: true },
        child, tag;

    for (var i = el.childNodes.length - 1; i >= 0; i--) {
        child = el.childNodes[i];
        if (child.nodeType == 1) {
            tag = child.nodeName.toLowerCase();
            if (!(tag in skipTags)) { wrapWords(child); }
        } else if (child.nodeType == 3 && /\w+/.test(child.textContent)) {
            var si, spanWrap;
            while ((si = child.textContent.indexOf(' ')) >= 0) {
                if (child != null && si == 0) {
                    child.splitText(1);
                    child = child.nextSibling;
                } else if (child != null) {
                    child.splitText(si);
                    spanWrap = document.createElement("span");
                    spanWrap.innerHTML = child.textContent;
                    child.parentNode.replaceChild(spanWrap, child);
                    child = spanWrap.nextSibling;
                }
            }
            if (child != null) {
                spanWrap = document.createElement("span");
                spanWrap.innerHTML = child.textContent;
                child.parentNode.replaceChild(spanWrap, child);
            }
        }
    }
};

wrapWords(document.body);

See demo

Regexp to wrap each word on HTML page

5 Answers5