1

I want to loop through all the words in an HTML document, and add a span to the word if the word is an arabic word. Any idea on how this can be done in jQuery?

I have tried the following:

    var text = $('body').text().split(' ');
        for( var i = 0, len=text.length; i<len; i++ ) {

            var mytext = $(this).val();
            var arabic = /[\u0600-\u06FF]/;
            if(arabic.test(mytext)){
              text[i] = '<span class="arabic">' + text[i]  +  '</span>';
            }
        }
            $(this).html(text.join(' '));    
  }

But it doesn't appear to me that the .text method is the way to go. Any ideas?

perpetual_dream
  • 1,046
  • 5
  • 18
  • 51

1 Answers1

2

You'll need to do this at a level below the level you normally use jQuery with: Text nodes.

This HTML:

<p>Hi there</p>

produces a p element containing a single text node. Normally with jQuery you only work with elements, but to do what you're doing non-destructively (without removing and recreating all of the elements, which will unhook all of their event handlers), you need to work at the node level, using tools like the DOM's splitText and insertBefore methods.

It's not complicated, but it just means working at a different level.

My other answer here on Stack Overflow shows you how to walk through the text nodes of a document, locate text within them, split it out and put it in a wrapper element (in your case, the span). In that case, the code uses a simplistic regular expression to find text that looks like a link and makes it into an actual link, e.g. changing:

<p>I found this information at http://stackoverflow.com.</p>

to

<p>I found this information at <a href="http://stackoverflow.com">http://stackoverflow.com</a>.</p>

You can see how that's very, very similar to what you want to do. You basically just change how you want to find text, and the rest of the code already does the job.


Here's the code from that answer updated to use a regular expression that looks for any character in a word being in the range you quoted for Arabic characters:

// The regex matches a series of characters in the given range.
// (Double-check this, I believe there's a second Arabic range in
// the Unicode standard, but I know next to nothing about Arabic.)
walk(document.body, /[\u0600-\u06FF]+/);

function walk(node, targetRe) {
  var child;

  switch (node.nodeType) {
    case 1: // Element
      for (child = node.firstChild;
           child;
           child = child.nextSibling) {
        walk(child, targetRe);
      }
      break;

    case 3: // Text node
      handleText(node, targetRe);
      break;
  }
}

function handleText(node, targetRe) {
  var match, targetNode, followingNode, wrapper;

  // Does the text contain our target string?
  match = targetRe.exec(node.nodeValue);
  if (match) {
    // Split at the beginning of the match
    targetNode = node.splitText(match.index);

    // Split at the end of the match.
    // match[0] is the full text that was matched.
    followingNode = targetNode.splitText(match[0].length);

    // Wrap the target in an `span` element with an `arabic` class.
    // First we create the wrapper and insert it in front
    // of the target text. We use the first capture group
    // as the `href`.
    wrapper = document.createElement('span');
    wrapper.className = "arabic";
    targetNode.parentNode.insertBefore(wrapper, targetNode);

    // Now we move the target text inside it
    wrapper.appendChild(targetNode);

    // Clean up any empty nodes (in case the target text
    // was at the beginning or end of a text node)
    if (node.nodeValue.length == 0) {
      node.parentNode.removeChild(node);
    }
    if (followingNode.nodeValue.length == 0) {
      followingNode.parentNode.removeChild(followingNode);
    }

    // Continue with the next match in the node, if any
    match = followingNode
      ? targetRe.exec(followingNode.nodeValue)
      : null;
  }
}

Only about three lines changed.

Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875