I use the PHP zip://
stream wrapper to parse large XML files line by line. For example:
$stream_uri = 'zip://' . __DIR__ . '/archive.zip#foo.xml';
$reader = new XMLReader();
$reader->open( $stream_uri, null );
$reader->read();
while ( true ) {
echo( $reader->readInnerXml() . PHP_EOL );
if ( ! $reader->next() ) {
break;
}
}
Quite often an XML file will include dodgy UTF control characters XMLReader
doesn't like. So I'd like to implement a custom stream wrapper I can pass the output of the zip://
stream to, which will run a preg_replace
on each line to remove those characters.
My dream is to be able to do this:
stream_wrapper_register( 'xmlchars', 'XML_Chars' );
$stream_uri = 'xmlchars://zip://' . __DIR__ . '/archive.zip#foo.xml';
and have XMLReader
happily read the tidied-up nodes. I've figured out a way to reconstruct the zip stream URI based on the path passed to my wrapper:
class XML_Chars {
protected $stream_uri = '';
protected $handle;
function stream_open( $path, $mode, $options, &$opened_path ) {
$parsed_url = parse_url( $path );
$this->stream_uri = 'zip:' . $parsed_url['path'] . '#' . $parsed_url['fragment'];
return true;
}
}
But I'm puzzled about the best way to open the zip://
stream so I can modify its output and pass the result through to the XMLReader
. Can anyone give me any pointers about how to implement that?