0

I am building an javascript application for which I need to know the html tags that belong to an user selection and then for easy use put them in an array.

I used htmlText which gave me a string that looks something like this:

<h1><span style="color: rgb(102, 51, 153); font-weight: bold; text-decoration: underline;"><sub>test</sub></span></h1>

Since I have hardly any knowledge of regular expressions and what I know just doesn't seem to do what I want, I was hoping one of you guys could help me on this part.

So what is the best way to make the above string look like the following array?

<h1>,
<span style="color: rgb(102, 51, 153); font-weight: bold; text-decoration: underline;">,
<sub>

My code so far (Don't know if I am on the right track though):

var fullhtml = SEOM_common.range.htmlText;//Get user selection + Surrounding html tags
var tags = fullhtml.split(SEOM_common.selected_value);//Split by user selection
var tags_arr = tags[0].match(/<(.+)>/);//Create array of tags

Thanks guys for the answers and comments. I managed to build the following method, which does exactly what I want.

find_all_parents : function(selectRange,endNode){
   var nodes = [];
    var nodes_to_go = [];
    if(selectRange.commonAncestorContainer) nodes_to_go.push(selectRange.commonAncestorContainer.parentNode);//all browsers
        else nodes_to_go.push(selectRange.parentElement());//IE<9 browsers

        var node;

        while( (node=nodes_to_go.pop()) && node.tagName.toLowerCase() != endNode){
            if(node.nodeType === 1){ //only element nodes (tags)
                nodes.push(node);
            }

            nodes_to_go.push(node.parentNode);          
        }
        return nodes;
    }
Rustam
  • 1,875
  • 2
  • 16
  • 33
  • 6
    You shouldn't parse HTML with regular expressions. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Will Jun 27 '12 at 17:49
  • Have to agree with William. There are far better ways to pull things out of the DOM. – Paul Alan Taylor Jun 27 '12 at 18:05

2 Answers2

1

Don't use regex for this. Use document manipulation methods instead and fetch the tags themselves (instead of the textual representation of the tags).

For example:

var find_all_nodes = function(rootNode){
    var nodes = [];
    var nodes_to_go = [rootNode];
    var node;
    while( (node=nodes_to_go.pop()) ){
        if(node.nodeType === 1){ //only element nodes (tags)
            nodes.push(nodes_to_go);
        }
        var cs = node.childNodes;
        for(var i=0; i<cs.length; i++){
            nodes_to_go.push(cs[i]);
        }
    }
    return nodes;
}

Once you have a tag you can get all sorts of information from it. I recomend checking out the DOM docs from MDN and the compatibility notes from Quirksmode

hugomg
  • 68,213
  • 24
  • 160
  • 246
0

You should not use Regex for HTML/XML parsing.

...unless you have a good reason to do so!

If so, then replace (<h1>)(<span[^>]*>)(<sub>)[^<]*</sub></span></h1> with $1,\n$2\n$3.

Community
  • 1
  • 1
Ωmega
  • 42,614
  • 34
  • 134
  • 203