I am downloading some XML content from the Adobe Connect API. I am loading the content into a XDocument and reading through all of the sco elements to save them to the database. However, one of the calls to the API contains an invalid character that gives the exception:
System.Xml.XmlException: '', hexadecimal value 0x0B, is an invalid character. Line 2, position 6495.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
at System.Xml.XmlTextReaderImpl.ParseText()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Load(XmlReader reader)
at ACRS.DataRefresherApp.Program.GetFolderContents(Folder parentFolder, AcrsDbContext db) in xxx:line 164
Here is a sample of the XML coming from the Adobe Connect API. Note: this example does not contain an invalid character.
<?xml version="1.0"?>
<results>
<status code="ok"/>
<scos>
<sco is-folder="1" duration="" display-seq="0" icon="folder" type="folder" folder-id="xx" source-sco-id="" sco-id="xx">
<name>Shared Templates</name>
<url-path>/f1101964883/</url-path>
<date-created>2010-09-16T15:21:15.993+10:00</date-created>
<date-modified>2013-12-11T22:31:05.130+11:00</date-modified>
<is-seminar>false</is-seminar>
</sco>
.....
</scos>
</results>
Here is the code I am using to read/load the XML data.
Stream responseStream = response.GetResponseStream();
XmlReader xmlReader = XmlReader.Create(responseStream, new XmlReaderSettings() { CheckCharacters = false });
var xmlResponse = XDocument.Load(xmlReader);
var folders = xmlResponse.Elements("results").Elements("scos").Elements("sco").ToList();
The exception occurs when the XDocument attempts to load the data from the xmlReader.
var xmlResponse = XDocument.Load(xmlReader);
I realise that I do not need to use the XmlReader and can load the XDocument directrly from the stream. However, I have included the XmlReader in response to this blog post by Paul Selles.
I have already read this thread: How to prevent System.Xml.XmlException: Invalid character in the given encoding
However, this does not fix my problem. Apparently, XML standards cause the reader to default to the declared document encoding once the document is being read. In the case of my document where no declaration is being made, it should default to UTF-8. See this answer.