I need to read some XML from a 3rd party source. None of their text fields are wrapped in CDATA tags, and they can't guarantee that the values won't include invalid characters--I'm particularly thinking of ampersands. They also won't add CDATA tags, because that might break things for their existing clients. Is there a parser out there that would handle this?
Asked
Active
Viewed 1,614 times
2
-
1If it's not XML, then you can't read it with an XML parser. Your 3rd party should stop lying about the fact they send XML - clearly, they don't send XML. – John Saunders May 09 '11 at 20:43
-
Any chance that, with reasonable robustness and effort, you can make the invalid XML "valid" before using a standard XML parser? – Christian.K May 10 '11 at 08:45
1 Answers
2
Assuming the invalid characters are properly escaped, not included literally in the XML, you can read it with the .NET library by creating an XmlTextReader with the Normalization property set to false. See http://msdn.microsoft.com/en-us/library/system.xml.xmltextreader.normalization.aspx

phoog
- 42,068
- 6
- 79
- 117
-
(Then the data is not strictly speaking XML.) I would run the stream through a function to replace the illegal characters with the proper escape sequence. – phoog May 09 '11 at 23:04
-
Close enough...I was thinking that would cause problems b/c some characters might already be escaped. But I think I can just add CDATA tags instead. – joelt May 12 '11 at 19:53