2

I am trying to pull the attributes out of piece of submitted text in Javascript and change it to an array.

So the user submits this:

<iframe src="http://www.stackoverflow.com/" width="123" height="123" frameborder="1"></iframe>

and I would get:

arr['src'] = http://www.stackoverflow.com/
arr['width'] = 123
arr['height'] = 123
arr['frameborder'] = 1

Just need a regexp I think but any help would be great!

Jeffrey Hunter
  • 1,103
  • 2
  • 12
  • 19
  • 1
    An HTML parser might be safer - then you don't have to worry about escaping and corner cases, particularly if this is user-supplied input. – Rup Oct 26 '11 at 13:55

5 Answers5

1

I recommend to use a RegExp to parse user-inputed HTML, instead of creating a DOM object, because it's not desired to load external content (iframe, script, link, style, object, ...) when performing a "simple" task such as getting attribute values of a HTML string.

Using similar (although similarcontradiction?) methods as in my previous answer, I've created a function to match quoted attribute values. Both quoted, as non-quoted attributes are matched.

The code currently returns an object with attributes from the first tag, but it's easily extensible to retrieve all HTML elements (see bottom of answer).

Fiddle: http://jsfiddle.net/BP4nF/1/

// Example:
var htmlString = '<iframe src="http://www.stackoverflow.com/" width="123" height="123" frameborder="1" non-quoted=test></iframe>';
var arr = parseHTMLTag(htmlString);
//arr is the desired object. An easy method to verify:
alert(JSON.stringify(arr));

function parseHTMLTag(htmlString){
    var tagPattern = /<[a-z]\S*(?:[^<>"']*(?:"[^"]*"|'[^']*'))*?[^<>]*(?:>|(?=<))/i;
    var attPattern = /([-a-z0-9:._]+)\s*=(?:\s*(["'])((?:[^"']+|(?!\2).)*)\2|([^><\s]+))/ig;
    // 1 = attribute, 2 = quote, 3 = value, 4=non-quoted value (either 3 or 4)

    var tag = htmlString.match(tagPattern);
    var attributes = {};
    if(tag){ //If there's a tag match
        tag = tag[0]; //Match the whole tag
        var match;
        while((match = attPattern.exec(tag)) !== null){
            //match[1] = attribute, match[3] = value, match[4] = non-quoted value
            attributes[match[1]] = match[3] || match[4];
        }
    }
    return attributes;
}

The output of the example is equivalent to:

var arr = {
    "src": "http://www.stackoverflow.com/",
    "width": "123",
    "height": "123",
    "frameborder": "1",
    "non-quoted": "test"
};

Extra: Modifying the function to get multiple matches (only showing code to update)

function parseHTMLTags(htmlString){
    var tagPattern = /<([a-z]\S*)(?:[^<>"']*(?:"[^"]*"|'[^']*'))*?[^<>]*(?:>|(?=<))/ig;
    // 1 = tag name
    var attPattern = /([-a-z0-9:._]+)\s*=(?:\s*(["'])((?:[^"']+|(?!\2).)*)\2|([^><\s]+))/ig;
    // 1 = attribute, 2 = quote, 3 = value, 4=non-quoted value (either 3 or 4)

    var htmlObject = [];
    var tag, match, attributes;
    while(tag = tagPattern.exec(htmlString)){
        attributes = {};
        while(match = attPattern.exec(tag)){
            attributes[match[1]] = match[3] || match[4];
        }
        htmlObject.push({
            tagName: tag[1],
            attributes: attributes
        });
    }
    return htmlObject; //Array of all HTML elements
}
Community
  • 1
  • 1
Rob W
  • 341,306
  • 83
  • 791
  • 678
0

Assuming you're doing this client side, you're better off not using RegExp, but using the DOM:

var tmp = document.createElement("div");
tmp.innerHTML = userStr;

tmp = tmp.firstChild;
console.log(tmp.src);
console.log(tmp.width);
console.log(tmp.height);
console.log(tmp.frameBorder);

Just make sure you don't add the created element to the document without sanitizing it first. You might also need to loop over the created nodes until you get to an element node.

Andy E
  • 338,112
  • 86
  • 474
  • 445
0

Assuming they will always enter an HTML element you could parse it and read the elements from the DOM, like so (untested):

var getAttributes = function(str) {
  var a={}, div=document.createElement("div");
  div.innerHTML = str;
  var attrs=div.firstChild.attributes, len=attrs.length, i;
  for (i=0; i<len; i++) {
    a[attrs[i].nodeName] = attrs[i].nodeValue];
  }
  return a;
};

var x = getAttributes(inputStr);
x; // => {width:'123', height:123, src:'http://...', ...}
maerics
  • 151,642
  • 46
  • 269
  • 291
  • You might want to test your solution in IE 6 & 7 :-p They'll just dump out every possible attribute an iframe element can have. – Andy E Oct 26 '11 at 14:01
0

Instead of regexp, use pure JavaScript:

Grab iframe element:

var iframe = document.getElementsByTagName('iframe')[0];

and then access its properties using:

var arr = {
   src         : iframe.src,
   width       : iframe.width,
   height      : iframe.height,
   frameborder : iframe.frameborder
};
hsz
  • 148,279
  • 62
  • 259
  • 315
0

I would personally do this with jQuery, if possible. With it, you can create a DOM element without actually injecting it into your page and creating a potential security hazard.

var userTxt = '<iframe src="http://www.stackoverflow.com/" width="123" height="123" frameborder="1"></iframe>';
var userInput = $(userTxt);
console.log(userInput.attr('src'));
console.log(userInput.attr('width'));
console.log(userInput.attr('height'));
console.log(userInput.attr('frameborder'));
Kaivosukeltaja
  • 15,541
  • 4
  • 40
  • 70