0

The issue: Receiving bad XML through the web / apps / file exchanges.

I was receiving XML responses through HTTP GET that sometimes would contain bad XML characters in the text. enter image description here

That character SUB was showing up in the text and the 'Get data from XML' step would fail to read it, stating that An invalid XML character (Unicode: 0x1a) was found in the element content of the document.

Cristian Curti
  • 1,004
  • 1
  • 7
  • 13

1 Answers1

0

The solution was found in this Post.

I did not need to use the entire JavaScript function contained in the answer, just the part of characters that are not allowed to be used in an XML.

What i did was a simple Replace() function in the 'Modified Java Script Value'

var str = result.replace(/[^\x09\x0A\x0D\x20-\xFF\x85\xA0-\uD7FF\uE000-\uFDCF\uFDE0-\uFFFD]/gm,'');

This has cleaned the entire XML of bad characters, and made the entire of it readable.

Cristian Curti
  • 1,004
  • 1
  • 7
  • 13
  • 2
    Glad it worked for you, but do be aware that any technique for repairing ill-formed XML will work on some corrupt instances and fail on others; it's very specific to the type of corruption. It's always best to investigate the root cause and fix the program that's generating the bad XML. – Michael Kay Mar 13 '19 at 08:48
  • Well yes, in my case there is no way for me to fix the WebService that's generating the XML because it is from another company, we just consume it, so i had to come up with a fix on my side for it. – Cristian Curti Mar 13 '19 at 13:07
  • I'm constantly amazed that people are prepared to consume content from a site that delivers ill-formed XML. It's like consuming infected meat. Avoid it like the plague. – Michael Kay Mar 13 '19 at 16:06