0

Is it possible to wrap each word on HTML page with span element? I'm trying something like

/(\s*(?:<\/?\w+[^>]*>)|(\b\w+\b))/g

but results far from what I need.

Thanks in advance!

Roman
  • 898
  • 1
  • 10
  • 24
  • 6
    You really [shouldn't parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454) – Joseph Marikle Aug 21 '11 at 21:28
  • 2
    You can't parse HTML with regex, only Chuck Norris can. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – stewe Aug 21 '11 at 21:29
  • 2
    Of course [you can use regexes to parse HTML](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491). In fact, some times you even *should*. However, Javascript has some of the **most horrible regexes** of any programming language anywhere. The `XRegExp` plugin helps, but it still sucks. It's easier to teach a pig to sing, and less annoying.Either do all Real™ work serverside where you can use a Real™ programming language, or else be prepared to improvise a 6-voice fugue for unaccompanied porcine chorus. – tchrist Aug 22 '11 at 00:16
  • Thanks guys, it seems I need to look in a direction of getting all text nodes and working with them. – Roman Aug 22 '11 at 07:45

5 Answers5

2

Well, I don't ask for the reason, you could do it like this:

function getChilds( nodes ) {
    var len = nodes.length;

    while( len-- ) {
        if( nodes[len].childNodes && nodes[len].childNodes.length ) {
            getChilds( nodes[len].childNodes );
        }

        var content = nodes[len].textContent || nodes[len].text;

        if( nodes[len].nodeType === 3 ) {
            var parent = nodes[len].parentNode,
                newstr = content.split(/\s+/).forEach(function( word ) {
                    var s = document.createElement('span');
                    s.textContent = word + ' ';

                    parent.appendChild(s);
                });

            parent.removeChild( nodes[len] );
        }
    };
}

getChilds( document.body.childNodes );

Even tho I have to admit I didn't test the code yet. That was just the first thing which came to my mind. Might be buggy or screw up completely, but for that case I know the gentle and kind stackoverflow community will kick my ass and downvote like hell :-p

jAndy
  • 231,737
  • 57
  • 305
  • 359
  • 1
    Why this line: `var each = Array.prototype.forEach;`? there doesn't seem to be a point to it. – Brock Adams Aug 22 '11 at 02:05
  • Yeah, first line is confusing, could you explain this? Anyway, with some modification this solved my problem. Thanks! – Roman Aug 22 '11 at 07:54
  • @Brock: yay you're right. Thats a hangover from a further version. I'll remove it. – jAndy Aug 22 '11 at 08:08
2

You're going to have to get down to the "Text" nodes to make this happen. Without making it specific to a tag, you really to to traverse every element on the page, wrap it, and re-append it.

With that said, try something like what a garble post makes use of (less making fitlers for words with 4+ characters and mixing the letters up).

Community
  • 1
  • 1
Brad Christie
  • 100,477
  • 16
  • 156
  • 200
1

To get all words between span tags from current page, you can use:

var spans = document.body.getElementsByTagName('span');
if (spans)
{
  for (var i in spans)
  {
    if (spans[i].innerHTML && !/[^\w*]/.test(spans[i].innerHTML))
    {
      alert(spans[i].innerHTML);
    }
  }
}
else
{
  alert('span tags not found');
}
Victor
  • 5,493
  • 1
  • 27
  • 28
  • 1
    My understanding is not to filter based on if they're already in a span, but to make every word itself get wrapped in a new span. ...maybe I'm misinterpreting? – Brad Christie Aug 21 '11 at 22:22
1

You should probably start off by getting all the text nodes in the document, and working with their contents instead of on the HTML as a plain string. It really depends on the language you're working with, but you could usually use a simple XPath like //text() to do that.

In JavaScript, that would be document.evaluate('//text()', document.body, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null), than iterating over the results and working with each text node separately.

shesek
  • 4,584
  • 1
  • 28
  • 27
1

See demo

Here's how I did it, may need some tweaking...

var wrapWords = function(el) {
    var skipTags = { style: true, script: true, iframe: true, a: true },
        child, tag;

    for (var i = el.childNodes.length - 1; i >= 0; i--) {
        child = el.childNodes[i];
        if (child.nodeType == 1) {
            tag = child.nodeName.toLowerCase();
            if (!(tag in skipTags)) { wrapWords(child); }
        } else if (child.nodeType == 3 && /\w+/.test(child.textContent)) {
            var si, spanWrap;
            while ((si = child.textContent.indexOf(' ')) >= 0) {
                if (child != null && si == 0) {
                    child.splitText(1);
                    child = child.nextSibling;
                } else if (child != null) {
                    child.splitText(si);
                    spanWrap = document.createElement("span");
                    spanWrap.innerHTML = child.textContent;
                    child.parentNode.replaceChild(spanWrap, child);
                    child = spanWrap.nextSibling;
                }
            }
            if (child != null) {
                spanWrap = document.createElement("span");
                spanWrap.innerHTML = child.textContent;
                child.parentNode.replaceChild(spanWrap, child);
            }
        }
    }
};

wrapWords(document.body);

See demo

mVChr
  • 49,587
  • 11
  • 107
  • 104