2

I am downloading some XML content from the Adobe Connect API. I am loading the content into a XDocument and reading through all of the sco elements to save them to the database. However, one of the calls to the API contains an invalid character that gives the exception:

System.Xml.XmlException: '', hexadecimal value 0x0B, is an invalid character. Line 2, position 6495.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
at System.Xml.XmlTextReaderImpl.ParseText()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Load(XmlReader reader)
at ACRS.DataRefresherApp.Program.GetFolderContents(Folder parentFolder, AcrsDbContext db) in xxx:line 164

Here is a sample of the XML coming from the Adobe Connect API. Note: this example does not contain an invalid character.

<?xml version="1.0"?>
<results>
    <status code="ok"/>
    <scos>
        <sco is-folder="1" duration="" display-seq="0" icon="folder" type="folder" folder-id="xx" source-sco-id="" sco-id="xx">
            <name>Shared Templates</name>
            <url-path>/f1101964883/</url-path>
            <date-created>2010-09-16T15:21:15.993+10:00</date-created>
            <date-modified>2013-12-11T22:31:05.130+11:00</date-modified>
            <is-seminar>false</is-seminar>
        </sco>
        .....
    </scos>
</results>

Here is the code I am using to read/load the XML data.

Stream responseStream = response.GetResponseStream();
XmlReader xmlReader = XmlReader.Create(responseStream, new XmlReaderSettings() { CheckCharacters = false });
var xmlResponse = XDocument.Load(xmlReader);
var folders = xmlResponse.Elements("results").Elements("scos").Elements("sco").ToList();

The exception occurs when the XDocument attempts to load the data from the xmlReader.

var xmlResponse = XDocument.Load(xmlReader);

I realise that I do not need to use the XmlReader and can load the XDocument directrly from the stream. However, I have included the XmlReader in response to this blog post by Paul Selles.

I have already read this thread: How to prevent System.Xml.XmlException: Invalid character in the given encoding

However, this does not fix my problem. Apparently, XML standards cause the reader to default to the declared document encoding once the document is being read. In the case of my document where no declaration is being made, it should default to UTF-8. See this answer.

Community
  • 1
  • 1
Robert
  • 400
  • 8
  • 23
  • @dbc I have done as you requested. However, the example XML document above is not the XML giving me the error. It is just an example of one of the successful API calls using the same code as above. – Robert Aug 02 '16 at 00:25
  • I should have mentioned that the particular Adobe Connect installation is accessed in several countries where different character sets are used. So even though most of the content is in English - a certain percentage is in different languages. The character displayed on my screen as an accented c. – Robert Aug 02 '16 at 00:45
  • Then maybe the response stream isn't utf8. Try setting the encoding as shown here: [Is it possible to get data from web response in a right encoding](https://stackoverflow.com/questions/7634113/is-it-possible-to-get-data-from-web-response-in-a-right-encoding) – dbc Aug 02 '16 at 00:58
  • I have tried this and it hasn't made any difference. :( – Robert Aug 02 '16 at 01:09
  • Then take a look at [Escape invalid XML characters in C#](https://stackoverflow.com/questions/8331119/escape-invalid-xml-characters-in-c-sharp) and [XML (de)serialization invalid string inconsistent in c#?](https://stackoverflow.com/questions/13450117). `CheckCharacters = false` only works for invalid *character entity references*, see [XmlReaderSettings CheckCharacters=false doesn't seem to work](https://stackoverflow.com/questions/21509569). – dbc Aug 03 '16 at 00:05

0 Answers0