0

I tried to check memory consumption for large XML files (over 1GB) by loading them into DOMDocument (Win/CLI PHP 8.0.3).

I've noticed that memory used by process increases rapidly while loading file the file:

$dom = new DOMDocument();
$dom->load($path);

Process used ca. 3GB of memory for a 1GB XML file. Memory limit for PHP was set to 128M. Moreover PHP did not reported almost any memory usage (memory_get_usage: 152 bytes for declared and 0 bytes for real). After performig XPath query memory showed by PHP rised by 32MB.

I observed similar behaviour with SimpleXMLElement and even on Apache/PHP.

It looks like DOMDocument and SimpleXMLElement uses some external memory included to memory used by process but not by PHP. When I tried with 4GB XML I touched the limit of my phisical memory (16GB) but process did not crashed.

Does anybody know why it is going like that and what memory is used during loading XML? (or maybe you know how to control it...)

Alek
  • 1
  • 1
  • There is a [very old bug](https://bugs.php.net/bug.php?id=42968) with a comment saying that because it is in an extension, the memory limit doesn’t get applied – Chris Haas Jul 22 '21 at 11:52
  • In general: For XML files of these sizes, you might want to look into a stream parser. https://stackoverflow.com/questions/3048583/what-is-the-fastest-xml-parser-in-php – CBroe Jul 22 '21 at 12:08
  • It's certainly entirely to be expected that a DOM for a 1Gb document would occupy 3 to 10 Gb of memory, depending on the density of markup. As for PHP not reporting or capping the memory usage, that's purely a question of whether PHP knows what's going on outside its own boundaries. – Michael Kay Jul 22 '21 at 16:10

0 Answers0