3

I want to capture text in an attribute within an XML tag. That is

<tag1 name="tag^*&,+">

I want to capture the value within the name attribute (which in this case would be tag^*&,+). This regular expression

name=\"([a-z0-9]+)\"  

will only return the value if the text in the attribute is alphanumeric. Is there any syntax that will return the captured value regardless of what symbol and characters? Thanks!

axsuul
  • 7,370
  • 9
  • 54
  • 71

5 Answers5

6

At the risk of beating a dead horse, don't try to "parse" XML with regular expressions. Use your programming language's XML library. It is then dead simple to select all tag1 elements and get the contents of their name attributes.

Not only is it easier for you to code, but you won't have to deal with nasty things like strings spanning multiple lines, string escapes (e.g. &quot;), weird edge cases that cause your regex to fail, etc.

rjh
  • 49,276
  • 4
  • 56
  • 63
  • +1 - with the caveat that there may be times when you don't want/need the overhead of an XML parser. – wsorenson Mar 05 '10 at 01:12
  • Reluctantly agreed... if you have a huge document and you're very confident about the form the XML will take, regexes can be a useful and seductive tool. But I've been burned by their fiery kiss too many times. – rjh Mar 05 '10 at 01:21
1

You should use:

name=\"([^\"]+)\"

In other words, the capturing group can be described as at least one of "any character other than the end quotation"

wsorenson
  • 5,701
  • 6
  • 32
  • 29
1

Check out regular-expressions.info

This will do what you want:

([^"]+)
jasonbar
  • 13,333
  • 4
  • 38
  • 46
  • 2
    And of course the obligatory "use an XML parser, regular expressions aren't suitable blah blah.." – jasonbar Mar 05 '10 at 01:09
  • 1
    Your lust for rep has aided and abetted Axsuul's descent into regex hell! – rjh Mar 05 '10 at 01:10
  • @rjh: hahah..although in this case he appears to have a fairly regular subset he is looking to handle...maybe just purgatory..? – jasonbar Mar 05 '10 at 01:16
1

It seems that your better of using an XML Parser I don't know what language your using but there's an XML parser for every language out there.

Michael D. Irizarry
  • 6,186
  • 5
  • 30
  • 35
0

. will match any character.

name = \"(.+)\"
Jesse Vogt
  • 16,229
  • 16
  • 59
  • 72