0

I'm used to use DOMDocument and DOMXPath to parse/traverse through my XML.

Now I have 2,5 MB of XML file and just realized the DOMDocument is extremely slow in 'handling' that file.

I googled out and found something about XMLReader which they say it's better way and probably has the fastest performance to handle large XML file

The problem is I don't know how to incorporate the DOMXPath with the XMLReader

how do I convert this following code to XMLReader ?

$dom = new DOMDocument;
        $dom->load('myxml.xml');
        
        $xp = new DOMXPath($dom);
        $xp->registerNamespace('f', 'my-name-space:namespace');
        
        $expression = "//f:my-person/@name";
        $col = $xp->query($expression);

I've done this but it's still slow:

Note: This is hypothetical code taken from this thread: XML DOMDocument optimization . However, the main point is the same

$xmlReader = new XMLReader();
        $xmlReader->open('myxml.xml');

while ($reader->read()) {
    switch ($reader->nodeType) {
        case (XMLREADER::ELEMENT):
        if ($reader->localName == "game") {
             $node = $reader->expand();
             $dom = new DomDocument();
             $n = $dom->importNode($node,true);
             $dom->appendChild($n);
             $xp = new DOMXPath($dom);
             $xp->registerNamespace('f', 'my-name-space:namespace');
        
             $expression = "//f:my-person/@name";
             $col = $xp->query($expression);
        }
    }
}

Feel free to suggest any PHP library to sort this out

mending3
  • 586
  • 7
  • 21
  • It would probably be faster (can't give any figures) if you look for the `my-person` elements rather than the `game` elements. You then just need to get the attribute rather than having to load XML and use XPath. – Nigel Ren Jun 16 '21 at 08:15

1 Answers1

1

XMLReader is not optimized for performance - it is for memory consumption. It allows to load only a part of an XML file into memory.

You can expand into a prepared document instance. Here is a classic book example:

$xmlString = <<<'XML'
<books xmlns="urn:example-books">
  <book>
    <title isbn="978-0596100087">XSLT 1.0 Pocket Reference</title>
  </book>
  <book>
    <title isbn="978-0596100506">XML Pocket Reference</title>
  </book>
</books>
XML;

$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode($xmlString));

// prepare a document instance
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register a namespace prefix
$xpath->registerNamespace('b', 'urn:example-books');

// look for the first "book" element in the defined namespace 
while (
  $reader->read() &&
  (
    $reader->localName !== 'book' ||
    $reader->namespaceURI !== 'urn:example-books'
  )
) {
  continue;
}

// as long there is a "book" element ...
while ($reader->localName === 'book') {
  // ... in the defined namespace
  if ($reader->namespaceURI === 'urn:example-books') {
    // expand into the prepared document instance
    $book = $reader->expand($document);
    // use xpath with the context argument
    var_dump(
      $xpath->evaluate('string(b:title/@isbn)', $book),
      $xpath->evaluate('string(b:title)', $book)
    );
  }
  // go to the next book sibling element
  $reader->next('book');
}
$reader->close();

Output:

string(14) "978-0596100087"
string(25) "XSLT 1.0 Pocket Reference"
string(14) "978-0596100506"
string(20) "XML Pocket Reference"
ThW
  • 19,120
  • 3
  • 22
  • 44