I've building a command line php scraping app that uses XPath to analyze the HTML - the problem is every time a new DOMXPath class instance gets loaded in a loop I'm getting a memory loss roughly equal to the size of the XML being loaded. The script runs and runs, slowly building up memory usage until it hits the limit and quits.
I've tried forcing garbage collection with gc_collect_cycles()
and PHP still isn't getting back memory from old Xpath requests. Indeed the definition of the DOMXPath class doesn't seem to even include a destructor function?
So my question is ... is there any way to force garbage clean up on DOMXPath
after I've already extracted the necessary data? Using unset on the class instance predictably does nothing.
The code is nothing special, just standard Xpath stuff:
//Loaded outside of loop
$this->dom = new DOMDocument();
//Inside Loop
$this->dom->loadHTML($output);
$xpath = new DOMXPath($this->dom);
$nodes = $xpath->query("//span[@class='ckass']");
//unset($this->dom) and unset($xpath) doesn't seem to have any effect
As you can see above I've kept the instantiation of a new DOMDocument
class outside of the loop, although that doesn't seem to improve performance. I've even tried taking the $xpath
class instance out of the loop and loading the DOM into Xpath directly using the __constructor
method, memory loss is the same.