-1

I need to tokenize following tag:

{TagName attrib1=”value1” attrib2=”value 3”}.

I would like to write regex to do it, but the trouble is that attribute value can contain space, so I can’t just split with space.

Daniel Vandersluis
  • 91,582
  • 23
  • 169
  • 153
Dan
  • 11,077
  • 20
  • 84
  • 119
  • 6
    [You really shouldn't try to parse XML with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – eliah Sep 28 '10 at 14:54
  • 1
    You need a real parser. You can write one yourself using `indexOf` (it's just a state machine with a stack, after all), but better is to use a parser generator such as Antlr: http://www.antlr.org/ – Anon Sep 28 '10 at 15:06
  • tags are not compound and this is about as complicated as it gets, so I thought it might be a bit simpler than full blown Xml... – Dan Sep 28 '10 at 15:38

1 Answers1

1

can't be put more clearly than this:

http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

please explain why you need regexp...

and, you didn't say anything about your preferred language...

assuming perl:

$str = "{TagName attrib1=\"value1\" attrib2=\"value 3\"}";

if ($str =~ m/{(\w+)\s+(\w+)="(.*?)"\s+(\w+)="(.*?)"/)
{
    print "tagname: $1\n";
    print "attrib: $2\n";
    print "value: $3\n";
    print "attrib: $4\n";
    print "value: $5\n";
}

But again, don't use regexps for this!!

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
  • the classic post: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454 – bsamek Sep 28 '10 at 15:09