0

I cant seem to be able to build a good regex expression (in javascript) that extracts each attribute from an xml node. For example,

<Node attribute="one" attribute2="two" n="nth"></node>

I need an express to give me an array of

['attribute="one"', 'attribute2="two"' ,'n="nth"']

... Any help would be appreciated. Thank you

James
  • 3
  • 1
  • 2
  • 4
    Time for the [obligatory link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Kerrek SB Jul 25 '11 at 02:46
  • Why wouldn't you just use an XML parser library? – jfriend00 Jul 25 '11 at 02:54
  • 1
    @jfriend00 - probably because browsers have a built–in XML parser and suitable DOM methods already. – RobG Jul 25 '11 at 03:13
  • I'm not sure i want the overhead of an xml parser library, plus i'm rarely ever going to have well formed xml. im actual parsing the diff generated by git. – James Jul 26 '11 at 01:30

4 Answers4

4

In case you missed Kerrek's comment:

you can't parse XML with a regular expression.

And the link: RegEx match open tags except XHTML self-contained tags

You can get the attributes of a node by iterating over its attributes property:

function getAttributes(el) {
  var r = [];
  var a, atts = el.attributes;

  for (var i=0, iLen=atts.length; i<iLen; i++) {
    a = atts[i];
    r.push(a.name + ': ' + a.value);
  }
  alert(r.join('\n'));
}

Of course you probably want to do somethig other than just put them in an alert.

Here is an article on MDN that includes links to relevant standards:

https://developer.mozilla.org/En/DOM/Node.attributes

Community
  • 1
  • 1
RobG
  • 142,382
  • 31
  • 172
  • 209
3

try this~

  <script type="text/javascript">
    var myregexp = /<node((\s+\w+=\"[^\"]+\")+)><\/node>/im;
    var match = myregexp.exec("<Node attribute=\"one\" attribute2=\"two\" n=\"nth\"></node>");
    if (match != null) {
    result = match[1].trim();
    var arrayAttrs = result.split(/\s+/);
    alert(arrayAttrs);}
  </script>
Monday
  • 1,403
  • 12
  • 10
  • I got about this far as well. unfortunately, a space in the attribute value breaks this. Perhaps I need to first replace spaces in between "" with an underscore, then after i split the array, return back to spaces? – James Jul 26 '11 at 01:32
0

The regex is /\w+=".+"/g (note the g of global).

You might try it right now on your firebug / chrome console by doing:

var matches = '<Node attribute="one" attribute2="two" n="nth"></node>'.match(/\w+="\w+"/g)
Pablo Fernandez
  • 103,170
  • 56
  • 192
  • 232
0

I think you could get it using the following. You would want the second and third matching group.

<[\w\d\-_]+\s+(([\w\d\-_]+)="(.*?)")*>
Ryan Gross
  • 6,423
  • 2
  • 32
  • 44
  • 1
    That won't work in a number of cases, such as if there's a namespace, e.g. ``, or an attribute name contains a colon (:) or a period (.) character (not included in the appropriate part of the regular expression) or the value contains a double quote character. – RobG Jul 25 '11 at 03:21