2

I have an XML document from which I want to extract some data:

<tnt:results>
<tnt:result>
<Document id="id1">
<impact _blabla_ for="tree.def" name="Something has changed"
select="moreblabla">true</impact>
<impact _blabla_ for="plant.def" name="Something else has changed"
select="moreblabla">true</impact>
</Document>
</tnt:result>
</tnt:results>

in reality there is no new line -- it's one continuous string and and there can be multiple < Document > elements. I want to have a regular expression that extracts:

  • id1
  • tree.def / plant.def
  • Something has changed / Something else has changed

I was able to come up with this code so far, but it only matches the first impact, rather than both of them:

preg_match_all('/<Document id="(.*)">(<impact.*for="(.*)".*name="(.*)".*<\/impact>)*<\/Document>/U', $response, $matches);

The other way to do it would be to match everything inside the Document element and pass it through a RegEx once more, but I thought I can do this with only one RegEx.

Thanks a lot in advance!

Richard JP Le Guen
  • 28,364
  • 7
  • 89
  • 119
cdavid
  • 497
  • 6
  • 11
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags everyone gets it once; i certainly have. – Dan Lugg Jun 11 '11 at 02:56

2 Answers2

1

Just use DOM, it's easy enough:

$dom = new DOMDocument;
$dom->loadXML($xml_string);

$documents = $dom->getElementsByTagName('Document');
foreach ($documents as $document) {
    echo $document->getAttribute('id');     // id1    

    $impacts = $document->getElementsByTagName('impact');
    foreach ($impacts as $impact) {
        echo $impact->getAttribute('for');  // tree.def
        echo $impact->getAttribute('name'); // Something has changed
    }
}
netcoder
  • 66,435
  • 19
  • 125
  • 142
0

Don't use RegEx. Use an XML parser.

Really, if you have to worry about multiple Document elements and extracting all sorts of attributes, you're much better off using an XML parser or a query language like XPath.

Richard JP Le Guen
  • 28,364
  • 7
  • 89
  • 119