Is there a way to do a 'simple' check if a XML file has a valid syntax? I'm using PHP's XMLReader.
I'm in this situation: I have multiple XML files that change a lot. So I can't do a XMLReader::isValid() check with a DTD file. But this it not needed persé. I only want to know if the syntax is OK. Because sometimes a XML file is corrupted for example at the end. I would like to check this, before iterating over the nodes.
The other thing is that some files are over 2GB in size, so I can't do a simple DOM check without using heavy memory.
What should I do?
Of course I tried options like suggested in the comments and this works great, but only for small files:
$dom = new DOMDocument;
if(!@$dom->load('example.xml')){ die("syntax error"); }
Larger files eat up all the memory and crash.
When I open a large XML file in a simple XML reader program like "firstobject XML editor", it shows me the syntax error line within milliseconds (30GB xml file it takes 1.7 seconds to show the line with syntax error). Something like this should be possible for PHP XMLReader I guess?
Edit: For the moment I will use the option above, but do a filesize check first. If below a certain size (still testing what the max size is) the syntax is checked. For the bigger files I will build an option as @IMSoP suggested below with a third party tool and command line check. I will update this if I find a stable solution for this.
Edit2 The idea of Progman (answer below) is the best till now I've seen. The only thing is that it will iterate the entire XML file. So processing takes already quite some time, this will double now. I was hoping for a quick validation option, but maybe this is not even possible. I wait a little bit to see if there are any other options, but else I think I should accept Progman answer as the best option for large files.
Edit 3: solution Alright, I just finetuned Progman's solution to use it without the set_error_handler option. Because I'm already using that for custom error handling, so what fits best for me is to suppress the errors by setting the libxml_use_internal_errors(true) flag and later check the errors like this, short example:
libxml_use_internal_errors(true);
$xml = new XMLReader();
$xml->open("large.xml");
while($xml->read());
foreach (libxml_get_errors() as $error) {
print_r($error);
}