0

In this question it is addressed how to parse large XML documents in PHP using streams, so that the whole document does not have to be put in memory.

However, the XMLReader class seems not fit for parsing huge text nodes inside an XML document. Since an API I'm using sends base64-encoded files as values of an XML document, together with some metadata, I'm looking for a way to stream those text nodes, rather than returning the value as a string:

<?php
$reader = XMLReader::open($someStream);

// $reader->read() until a node is reached

// The following puts the whole text node in memory, rather than creating a stream
$content = $reader->value; 
?>

Is it possible to turn $reader->value into a stream?

Community
  • 1
  • 1
Harmen
  • 22,092
  • 4
  • 54
  • 76

1 Answers1

0

What I have come up with is using the low-level XML Parser of PHP and some streaming functions.

$input = fopen('input.xml', 'r');
$output = fopen('output.txt', 'w');
stream_filter_append($output, 'convert.base64-decode');

These are passed to a class which creates an XML Parser:

public function __construct($input, $output) {
    // ...
    $this->xml = xml_parser_create();
    xml_set_object($this->xml, $this);
    xml_set_element_handler($this->xml, 'start', 'end');
    xml_set_character_data_handler($this->xml, 'character');
}

The start and end methods are used to find the correct element in the XML, and the character method writes the contents to the output stream:

protected function character($parser, $data)
{
    if ($this->match()) {
        fwrite($this->output, $data);
    }
}

The efficient part is where we call the parser, which only reads manageable chunks at a time:

while ($data = fread($this->input, $bufferSize = 1024)) {
    xml_parse($this->xml, $data, feof($this->input) or $this->done);
}

The $this->done can be set in the start or end handlers, and in my case I completely removed the handlers as soon as a match was found.

Since these old php functions don't throw, some safety checks must still be implemented of course.

Harmen
  • 22,092
  • 4
  • 54
  • 76