4

I am processing an external xml document using the method described here ( How to use XMLReader in PHP? ), but I'm coming across this error:

...parser error : Entity 'Atilde' not defined in...

and similar, such as

cent, acirc, not

The error occurs on the $z->expand() function. If I comment that out, it occurs on the $z->next() function.

I know the problem field and have tried to edit it with base64_encode before expanding, but it's readonly.

EDIT: the problem string is:

...ââ¬Â...

end edit

Thank you for any help given.

Community
  • 1
  • 1
Dave
  • 63
  • 1
  • 4
  • you know the error are caused by HTML entities, then you should no user the XML parser. Try the DOMDocument instead (like the question you have included) – ajreal Sep 01 '11 at 15:54
  • It's a large xml document, so I can't have all the memory used up – Dave Sep 01 '11 at 15:59

4 Answers4

2

XML does only know the entities lt, gt, amp, apos, and quot. So any other entity reference will raise an error. (Note that character references and entity references are not the same.)

You can use strtr to convert any HTML entity reference that is not also known in XML:

$trans = array_map('utf8_encode', array_flip(array_diff(get_html_translation_table(HTML_ENTITIES), get_html_translation_table(HTML_SPECIALCHARS))));
$output = strtr($input, $trans);

get_html_translation_table returns an array for the mapping of character onto entity reference. get_html_translation_table(HTML_ENTITIES) returns a mapping for all entities while get_html_translation_table(HTML_SPECIALCHARS) returns only those mentioned above. array_diff will give the difference, so all entities without those mentioned above. array_flip inverts the key/value association and applying array_map with utf8_encode will convert the values from ISO 8859-1 to UTF-8.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • What should I use as the input please? I tried it with $z->expand() as the input and got the error: "Catchable fatal error: Object of class DOMElement could not be converted to string in..." – Dave Sep 01 '11 at 16:12
  • Oh, wait a second. Does setting `$z->setProperty(XMLReader::SUBST_ENTITIES, true);` before `$z->open(…)` work? – Gumbo Sep 01 '11 at 16:17
  • It came up with an "undefined" error, but I found setParserProperty, which I guess you meant. Unfortunately it didn't work. Thank you anyway, I appreciate your time and effort. I guess I could always tell the xml feed supplier to fix it, but they'll just ignore me. – Dave Sep 01 '11 at 16:26
1

Maybe xml_set_external_entity_ref_handler is the solution for your case:

http://php.net/manual/en/example.xml-external-entity.php

http://www.php.net/manual/en/function.xml-set-external-entity-ref-handler.php

0

Encountered the same problem..

My solution was opening the XML file in notepad++, search and replaced the characters to readable ones.

Not a beautiful solution but it works;)

mmmmm
  • 595
  • 2
  • 5
  • 20
0

This is a flaw in the original XML but it's not uncommon. I didn't have much luck with the solutions here (other than Wout van der Vegt's), so here's the "make a new XML that is fixed" approach:

// Needs PHP 5.4.0+

$file = "xmldata_with_entities.xml";
$file2 = "xmldata_converted.xml";

$handle1 = fopen($file, "r");
$handle2 = fopen($file2, "w");
if ($handle1) {
    while (($line = fgets($handle1)) !== false) {
        fwrite($handle2, html_entity_decode($line,ENT_HTML5));
    }
}
fclose($handle1);
fclose($handle2);

Obviously you could then use $file2 in XMLReader.

Paul Gregory
  • 1,733
  • 19
  • 25