4

I have an XML document with the following structure:

<posts>
<user id="1222334">
  <post>
    <message>hello</message>
    <client>client</client>
    <time>time</time>
  </post>
  <post>
    <message>hello client how can I help?</message>
    <client>operator</client>
    <time>time</time>
  </post>
</user>
<user id="2333343">
  <post>
    <message>good morning</message>
    <client>client</client>
    <time>time</time>
  </post>
  <post>
    <message>good morning how can I help?</message>
    <client>operator</client>
    <time>time</time>
  </post>
</user>
</posts>

I am able to create the parser and print out the whole document, the problem is however that I want to print only the (user) node and children with a specific attribute (id).

my PHP code is:

if( !empty($_GET['id']) ){
    $id = $_GET['id'];
    $parser=xml_parser_create();
    function start($parser,$element_name,$element_attrs)
      {
    switch($element_name)
        {
        case "USER": echo "-- User --<br>";
        break;
        case "CLIENT": echo "Name: ";
        break;
        case "MESSAGE": echo "Message: ";
        break;
        case "TIME": echo "Time: ";
        break;
        case "POST": echo "--Post<br> ";
        }
  }

function stop($parser,$element_name){  echo "<br>";  }
function char($parser,$data){ echo $data; }
xml_set_element_handler($parser,"start","stop");
xml_set_character_data_handler($parser,"char");

$file = "test.xml";
$fp = fopen($file, "r");
while ($data=fread($fp, filesize($file)))
  {
  xml_parse($parser,$data,feof($fp)) or 
  die (sprintf("XML Error: %s at line %d", 
  xml_error_string(xml_get_error_code($parser)),
  xml_get_current_line_number($parser)));
  }
xml_parser_free($parser);
}

using this in the start() function can select the right node but it doesn't have any effect on the reading process:

    if(($element_name == "USER") && $element_attrs["ID"] && ($element_attrs["ID"] == "$id"))

any help would be appreciated

UPDATE: XMLReader works but when using if statement it stops working:

foreach ($filteredUsers as $user) {
echo "<table border='1'>";
foreach ($user->getChildElements('post') as $index => $post) {

    if( $post->getChildElements('client') == "operator" ){
    printf("<tr><td class='blue'>%s</td><td class='grey'>%s</td></tr>", $post->getChildElements('message'), $post->getChildElements('time'));
    }else{
    printf("<tr><td class='green'>%s</td><td class='grey'>%s</td></tr>", $post->getChildElements('message'), $post->getChildElements('time'));

    }
}
echo "</table>";
}
razz
  • 9,770
  • 7
  • 50
  • 68
  • Would it be okay to use [`XMLReader`](http://php.net/book.xmlreader) instead of the expat parser? – hakre Mar 15 '13 at 17:08
  • I prefer to use the Expat parser, it's native to PHP and can handle large XML files, it's also event based parser rather than DOM. I find it fast powerful and i especially like the `xml_set_element_handler` function which helps defining the start and ending tags easily. i'm sure there must be an option to read part of the document!! – razz Mar 15 '13 at 17:51
  • `XMLReader` is native to PHP and can handle large XML files, it is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way. And for Expat: No there is no such option, but for XMLReader there is ;) That's why I'm asking. – hakre Mar 15 '13 at 17:55
  • that sounds good, if its not DOM parser, doesn't use lots of memory, doesn't require installations, fast and there's no way `Expat` can do the job for me... then `XMLReader` would be great and i would really appreciate if you can show me how to use it to solve my problem :) – razz Mar 15 '13 at 18:24
  • This answer shows how to turn specific elements into an XML chunk of it's own (here as SimpleXMLElement for *only* some elements): http://stackoverflow.com/a/15351723/367456 – hakre Mar 15 '13 at 19:04

2 Answers2

8

As suggested in a comment earlier, you can alternatively use the XMLReaderDocs.

The XMLReader extension is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.

It is a class (with the same name: XMLReader) which can open a file. By default you use next() to move to the next node. You would then check if the current position is at an element and then if the element has the name you're looking for and then you could process it, for example by reading the outer XML of the element XMLReader::readOuterXml()Docs.

Compared with the callbacks in the Expat parser, this is a little burdensome. To gain more flexibility with XMLReader I normally create myself iterators that are able to work on the XMLReader object and provide the steps I need.

They allow to iterate over the concrete elements directly with foreach. Here is such an example:

require('xmlreader-iterators.php'); // https://gist.github.com/hakre/5147685

$xmlFile = '../data/posts.xml';

$ids = array(3, 8);

$reader = new XMLReader();
$reader->open($xmlFile);

/* @var $users XMLReaderNode[] - iterate over all <user> elements */
$users = new XMLElementIterator($reader, 'user');

/* @var $filteredUsers XMLReaderNode[] - iterate over elements with id="3" or id="8" */
$filteredUsers = new XMLAttributeFilter($users, 'id', $ids);

foreach ($filteredUsers as $user) {
    printf("---------------\nUser with ID %d:\n", $user->getAttribute('id'));
    echo $user->readOuterXml(), "\n";
}

I have create an XML file that contains some more posts like in your question, numbered in the id attribute from one and up:

$xmlFile = '../data/posts.xml';

Then I created an array with two ID values of the user interested in:

$ids = array(3, 8);

It will be used in the filter-condition later. Then the XMLReader is created and the XML file is opened by it:

$reader = new XMLReader();
$reader->open($xmlFile);

The next step creates an iterator over all <user> elements of that reader:

$users = new XMLElementIterator($reader, 'user');

Which are then filtered for the id attribute values stored into the array earlier:

$filteredUsers = new XMLAttributeFilter($users, 'id', $ids);

The rest is iterating with foreach now as all conditions have been formulated:

foreach ($filteredUsers as $user) {
    printf("---------------\nUser with ID %d:\n", $user->getAttribute('id'));
    echo $user->readOuterXml(), "\n";
}

which will return the XML of the users with the IDs 3 and 8:

---------------
User with ID 3:
<user id="3">
        <post>
            <message>message</message>
            <client>client</client>
            <time>time</time>
        </post>
    </user>
---------------
User with ID 8:
<user id="8">
        <post>
            <message>message 8.1</message>
            <client>client</client>
            <time>time</time>
        </post>
        <post>
            <message>message 8.2</message>
            <client>client</client>
            <time>time</time>
        </post>
        <post>
            <message>message 8.3</message>
            <client>client</client>
            <time>time</time>
        </post>
    </user>

The XMLReaderNode which is part of the XMLReader iterators does also provide a SimpleXMLElementDocs in case you want to easily read values inside of the <user> element.

The following example shows how to get the count of <post> elements inside the <user> element:

foreach ($filteredUsers as $user) {
    printf("---------------\nUser with ID %d:\n", $user->getAttribute('id'));
    echo $user->readOuterXml(), "\n";
    echo "Number of posts: ", $user->asSimpleXML()->post->count(), "\n";
}

This would then display Number of posts: 1 for the user ID 3 and Number of posts: 3 for the user ID 8.

However, if that outer XML is large, you don't want to do that and you want to continue to iterate inside that element:

// rewind
$reader->open($xmlFile);

foreach ($filteredUsers as $user) {
    printf("---------------\nUser with ID %d:\n", $user->getAttribute('id'));
    foreach ($user->getChildElements('post') as $index => $post) {
        printf(" * #%d: %s\n", ++$index, $post->getChildElements('message'));
    }
    echo "Number of posts: ", $index, "\n";
}

Which produces the following output:

---------------
User with ID 3:
 * #1: message 3
Number of posts: 1
---------------
User with ID 8:
 * #1: message 8.1
 * #2: message 8.2
 * #3: message 8.3
Number of posts: 3

This example shows: depending on how large the nested children are, you can traverse further with the iterators available via getChildElements() or you can use as well the common XML parser like SimpleXML or even DOMDocument on a subset of the XML.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • it works but it's printing the results in one line: `client1 - message1 - time1 - client2 - message2 - time2....` is there a way i can customize the output like `if($client = "operater"){echo messagetime)}else{....}`? – razz Mar 16 '13 at 11:25
  • sure, youre not limited with the output. I just use plain textin the example to keep it small, but you can use HTML instead if you prefer. – hakre Mar 16 '13 at 11:28
  • i tried this `if( $post->getChildElements('client') == "operater" ){...}else{...}` and this `if( $post->getChildElements('client')->item(0) == "operater" ){...}else{...}` in `foreach ($filteredUsers as $user)` but it doesn't seem to be working!! – razz Mar 16 '13 at 11:49
  • ah, you have a decision in there. If you need to re-use a value you should not iterate further because XMLReader is forward-only. That means you need to store the current element you want to do a decision with. if you want to get a more detailed example, you must provide as well more of your XML otherwise it's not that clear to me so I can not give a good suggestion. – hakre Mar 16 '13 at 12:03
  • i updated my XML file and added foreach loop that i want to use, after some tests i found out that `if( $post->getChildElements('client') == "operater" )` is working but it's not printing the messages or times! – razz Mar 16 '13 at 12:35
  • 1
    Okay I can see this now, this is how I told above. The moment you iterate over the childelements with `$post->getChildElements('client')` the reader is already at the `` element. That means you can not reach `` any longer because it is before ``. Use the `asSimpleXML()` or `toArray()` functionality instead. It stores all those values and you can work on them easily. – hakre Mar 16 '13 at 13:05
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/26292/discussion-between-hakre-and-razzak) – hakre Mar 16 '13 at 13:14
0

You can use PHP SimpleDomHTML (A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!) You can query your data like the way you work with jQuery. It support HTML so for sure it well support for XML document.

You can download and view the document here: http://simplehtmldom.sourceforge.net/

Vinh TRAN
  • 52
  • 1
  • 5