0

I'm writing a script to parse for this XML.

I want to parse all the <Contents> node with DOMDocument and DOMXpath. But for some reason, all the XPath queries I tried failed.

My code:

<?php

$apiUrl = 'https://chromedriver.storage.googleapis.com/?delimiter=/&prefix=98.0.4758.48/';
$xmlContents = file_get_contents($apiUrl);
if (!$xmlDom->loadXML($xmlContents)) {
    throw new \Exception('Unable to parse the chromedriver file index API response as XML.');
}
$xpath = new \DOMXPath($xmlDom);

// **I tried several $query values here**
$fileEntries = $xpath->query($query, null, false);
if (!$fileEntries instanceof \DOMNodeList) {
    throw new \Exception('Failed to evaulate the xpath into node list.');
}

echo "There are {$fileEntries->length} results\n";
foreach ($fileEntries as $node) {
    /** @var \DOMNode $node */
    var_dump($node->nodeName);
}

XPath $query I tried:

  • /ListBucketResult/Contents
  • /Contents
  • //Contents

All of these results in "There are 0 results".

If I use * in the $query, it will list all the nodes within the <ListBucketResult> root node:

There are 10 results
string(4) "Name"
string(6) "Prefix"
string(6) "Marker"
string(9) "Delimiter"
string(11) "IsTruncated"
string(8) "Contents"
string(8) "Contents"
string(8) "Contents"
string(8) "Contents"
string(8) "Contents"

The easy way is to filter the nodes with the nodeName attribute. But I do want to know what went wrong with my XPath query. What did I miss?

Koala Yeung
  • 7,475
  • 3
  • 30
  • 50

1 Answers1

2

What you missed - because you didn't see it in the view given - is, that all nodes are in a namespace, because the root element really is

<ListBucketResult xmlns="http://doc.s3.amazonaws.com/2006-03-01">

So this element and all of its children are in the namespace http://doc.s3.amazonaws.com/2006-03-01. Adding a namespace like this

$xpath->registerNamespace("aws", "http://doc.s3.amazonaws.com/2006-03-01");

after $xpath = new DOMXPath($xmlDom); and using it in your XPath expressions like that

/aws:ListBucketResult/aws:Contents

should solve your problem.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • Unfortunately, the URL http://doc.s3.amazonaws.com/2006-03-01 gives me an "access denied" response. – Koala Yeung Jan 18 '22 at 13:16
  • The URI of an XML namespace _doesn't have to be valid_. See [here](https://stackoverflow.com/a/27614076/1305969). – zx485 Jan 18 '22 at 13:27
  • Thanks! This is a bit non-intuitive. The nodes don't really have the `aws:` prefix (or any other prefix) so I didn't think registering namespace was necessary. – Koala Yeung Jan 18 '22 at 14:01