Problem 1: Accessing innerXHTML as a string
Imagine the following XML:
<?xml version="1.0" encoding="utf-8" ?>
<feed>
<title type="text">This is my title</title>
<id>123456</id>
<content>Hello World</content>
</feed>
Let's say we want to access the <id>
value as a string. One would think that could be accessed with:
$xml = simplexml_load_file('file.xml');
print_r($xml->id);
But that's not right, we'll end up just printing a new SimpleXMLElement, like so:
SimpleXMLElement Object
(
[0] => 123456
)
So we get back a new object of which 0 is a property, I guess? There's two way that seem natural to access this, neither of which work:
//throws an error
$xml = simplexml_load_file('file.xml');
print_r($xml->id->0);
//prints "SimpleXMLElement Object ( [0] => 123456 )"
$xml = simplexml_load_file('file.xml');
print_r($xml->id[0]);
So that leads to question A: just what is inside of $xml->id
? It kind of acts like an object, but it also kind of acts like an array. Ultimately, there's two ways to access this value:
//prints '123456'
$xml = simplexml_load_file('file.xml');
$id = (array) $xml->id;
print_r($id[0]);
//prints '123456'
$xml = simplexml_load_file('file.xml');
print_r($xml->id->__toString());
Of these, the second feels more "right" to me, but I'm left wondering just what is going on here. Question B: Why are $xml->id
and $xml->id[0]
identical? For that matter, why are $xml->id[0]
and $xml->id[0][0][0][0][0][0]
also identical?
Problem 2: Dealing with multiple nodes of the same type
Imagine the following XML
<?xml version="1.0" encoding="utf-8" ?>
<feed>
<title type="text">This is my title</title>
<tag>news</tag>
<tag>sports</tag>
<content>Hello World</content>
</feed>
Suppose you want to get a list of all tags. This is where I start to get really confused.
$xml = simplexml_load_file('file.xml');
print_r($xml->tag);
This has the following result:
SimpleXMLElement Object
(
[0] => news
)
That's sensible enough, but this is the part I don't get. We can also do this:
$xml = simplexml_load_file('file.xml');
print_r($xml->tag[1]);
Which prints out this:
SimpleXMLElement Object
(
[0] => sports
)
What the hell? If both tags are available inside $xml->tag
then, Question C: why doesn't print_r($xml->tag)
print the following:
SimpleXMLElement Object
(
[0] => news
[1] => sports
)
I guess $xml->tag
implies $xml->tag[0]
? Ultimately, the only way I can see to access a list of all the <tags>
is with xpath:
$xml = simplexml_load_file('file.xml');
$tags = $xml->xpath('//tag');
//$tags is now an array of objects. We want an array of strings.
foreach ($tags as &$tag) {
$tag = (string) $tag;
}
print_r($tags);
Which outputs:
Array
(
[0] => news
[1] => sports
)
But that honestly seems like a lot of code to do something pretty simple and common. So Question D: is there a better way to get a list of values from XML natively in PHP?