I am trying to write a script to process (clean up, reformat) an HTML file, using DOM. Here is my code for loading the file:
$dom = new DOMDocument();
$dom->loadHTML($htmFName, LIBXML_PARSEHUGE);
And here is my code for traversing the document and inspecting/modifying the nodes:
class DOMTraverser
{
private $node;
public function __construct(DOMNode $node)
{
$this->node = $node;
}
public function traverse(GeneralCallBack $cb, $param) {
$cb->callBefore($this->node, $param);
foreach ($this->node->childNodes as $subnode) {
if ($subnode->hasChildNodes()) {
// $trav = new DOMTraverser($subnode);
// $trav->traverse($cb, $param);
$this->traverse($cb, $param);
}
}
$cb->callAfter($this->node);
}
}
...
$trav = new DOMTraverser($dom)
$callback = new StoryDocCallback();
$trav->traverse($callback, $storyParms);
The problem is reported in the foreach
statement of the traverse
function:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried
to allocate 4096 bytes) in D:\D\src\inc\DOMTraverser.cl on line 17
My input file is large (2.6MB, with nearly 15,000 tags), but nowhere near the 134MB size mentioned in the error message.
How can I process this file without running out of memory. Would I be better off doing this in Java?
Side note: while the "allocated memory size" of 134,217,728 bytes seems like a lot, it's actually rather small compared with the 6GB of memory on my system. Maybe there's a configuration variable I could change?
PHP 7.0.8