1

I want to parse 'invalid' xml using a streaming xml parser. I have two options

XmlReader.Create(...,
    new XmlReaderSettings() 
{ 
    CheckCharacters = false, 
    ConformanceLevel = ConformanceLevel.Fragment, 
    ValidationFlags = System.Xml.Schema.XmlSchemaValidationFlags.None,
    ValidationType = ValidationType.None 
}))

Second example

new XmlTextReader(...) { Namespaces = false, Normalization = false })

The first is failing on unrecognized namespaces which are presented in the xml: '...' is an undeclared prefix.

The second is failing on invalid characters: XmlException: '', hexadecimal value 0x13, is an invalid character. Line ...

Is there an option to combine both behaviors (Namespaces = false && CheckCharacters = false) so parsing will not fail on undefined namespaces and invalid characters?

Input "xml" cannot be changed as provided as is. It is also huge and cannot be loaded to the memory.

Update Xml example

<?xml version="1.0" encoding="UTF-8"?>
<x xmlns="http://www.w3.org/2005/Atom">
    <item>
        <my_ns:id>123 _0x13_here_ dd</my_ns:id>
        <other_ns:value>ABC</other_ns:value>
    </item>
</x>

Where _0x13_here_ is a (char)'\x13' I was wrong, and using CheckCharacters = false not helping here. It allows to avoid exceptions on content like &#x13; only.

Mekanik
  • 2,532
  • 1
  • 22
  • 20
  • XML with undeclared namespace prefixes is not 'invalid', it's not well formed. I don't think you'll have much luck with any parser reading it, really. Disabling namespace support in `XmlTextReader` will just mean you get errors that `:` isn't allowed in names. – Charles Mager Jun 09 '16 at 08:15
  • @CharlesMager I've added an example of the xml. `XmlTextReader { Namespaces = false }` is able to read an xml with that type of namespaces without exceptions. – Mekanik Jun 09 '16 at 20:14
  • 1
    Please look at my answer to the similar question [here](https://stackoverflow.com/questions/28339811/how-do-i-create-a-xmltextreader-that-ignores-namespaces-and-does-not-check-chara) – Alterant Oct 31 '17 at 20:25

1 Answers1

1

Here is a solution to combine:
- multiple root elements (ConformanceLevel.Fragment)
- undefined prefix (AddNamespace)

var settings = new XmlReaderSettings() {
    NameTable = new NameTable(),
    ConformanceLevel = ConformanceLevel.Fragment
};
var nsmgr = new XmlNamespaceManager(settings.NameTable);
nsmgr.AddNamespace("MyNamespace", "http://exemple.com");
var context = new XmlParserContext(null, nsmgr, null, XmlSpace.Default);
var reader = XmlReader.Create(stream, settings, context );
Nicolas
  • 6,289
  • 4
  • 36
  • 51