1

I have created a Wordpress/WooCommerce plugin which creates an XML file from our products.

But in some rows there are illegal characters.

error on line 15622 at column 22: Input is not proper UTF-8, indicate encoding !
Bytes: 0x03 0xC3 0xB6 0x73

How can I solve this, so the XML is parsed correctly?

XML FEED FILE

The code for generating is something like:

$dom = new DOMDocument('1.0', 'UTF-8');

// create root element
$root = $dom->createElement("termeklista");
$dom->appendChild($root);
$dom->formatOutput=true;

then a while loop with filling the data. The issue is in the description tag.

// DESCRIPTION

$description = $dom->createElement("leiras");
$producta->appendChild($description);
// create CDATA section
$cdata = $dom->createCDATASection("\n".$loop->post->post_excerpt."\n");
$description->appendChild($cdata);

I have tried iconv, utf8_encode, custom function to replace the wrong characters, but I cannot figure it out what the issue.

The WooCommerce product post excerpt does not have any illegal characters in it.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
beamkiller
  • 188
  • 2
  • 12
  • I recommend adding a tag for whatever language this is in (PHP I think). You will likely get more views that way. – ug_ May 26 '16 at 01:03

2 Answers2

2

0x03 (aka ^C aka ETX aka end of transmission) is not an allowed character in XML :

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Therefore your data is not XML, and any conformant XML processor must report an error such as the one you received.

You must repair the data by removing any illegal characters by treating it as text, not XML, manually or automatically before using it with any XML libraries.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Yeah, but why it is happening? If I remove that product: inofolic, than the error is on other line. So it is not related with the content of post excerpt. – beamkiller May 26 '16 at 07:37
  • The error is happening precisely because a character outside of the acceptable range exists in your data. If you're still receiving the error after removing portions of your data, then you're missing an occurrence of the offending character. – kjhughes May 26 '16 at 18:19
0

So,

I was able to solve the issue with the stripInvalidXML() function in this question. Thanks for the autor. The XML is now valid.

stripInvalidXML from file

Community
  • 1
  • 1
beamkiller
  • 188
  • 2
  • 12