3

My Code:

using (XmlTextReader inputReader = new XmlTextReader(xml, XmlNodeType.Document, new XmlParserContext(null, null, "en", XmlSpace.Default)))
        {
            XsltArgumentList arglist = new XsltArgumentList();
            GetXSLT().Transform(inputReader, arglist, outputStream);
        }

The XmlTextReader is created fine, inside the XML there is an entity reference for a vertical tab ()

The line that errors is the call to Transform. It says that there is an invalid XML character (the vertical tab of course).

I've tried using the approach referenced in the following article:
Escape invalid XML characters in C#

My question is: how can I remove or ignore the invalid characters using the .NET framework like the link states?

note: in a way that doesn't involve hard coding a list of entity references to replace (I'm already doing this and it is horrible and I feel bad, and I should)

Community
  • 1
  • 1
Nateous
  • 757
  • 9
  • 23
  • 1
    You can try [ignoring it](http://stackoverflow.com/a/2272525/11683) instead of removing. – GSerg Feb 02 '15 at 16:43
  • I tried but it still throws the same exception – Nateous Feb 02 '15 at 18:06
  • 1
    You are ignoring them while reading, you should also ignore them while writing. – GSerg Feb 02 '15 at 19:05
  • you are right, I just got that – Nateous Feb 02 '15 at 19:58
  • i'll mark as Answer if you can post a nice way to use `var validXmlChars = text.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray();` to get the characters removed – Nateous Feb 02 '15 at 20:03
  • Is the `XmlDoctor` here any help? https://stackoverflow.com/questions/27925128/removing-invalid-characters-from-xml-file-before-deserialization/27976613#27976613 – dbc Feb 03 '15 at 01:42
  • @dbc that is a lot of code (too much) and it appears that it is hard coded in terms of which characters it is looking to replace, I'd rather use a regex. I am looking for a solution that relies on the MS .NET framework to tell which characters it needs to replace. – Nateous Feb 03 '15 at 14:46
  • @GSerg ignoring might end up being a better solution, so far it seems to be working. I'm still testing it (has to go through a database, web page displaying, printed materials, etc.) – Nateous Feb 03 '15 at 14:47
  • @GSerg I think your solution is best. put in an answer and I'll mark it as the answer. otherwise I'll post my code to close this loop, thanks for your help. – Nateous Feb 04 '15 at 14:05

1 Answers1

1

Try ignoring invalid XML characters both while reading and writing:

var readerSettings = new XmlReaderSettings() { CheckCharacters = false, ConformanceLevel = ConformanceLevel.Document };

using (var inputReader = XmlTextReader.Create(xml, readerSettings, new XmlParserContext(null, null, "en", XmlSpace.Default)))
{
    XsltArgumentList arglist = new XsltArgumentList();
    var xslt = GetXSLT();

    var writerSettings = xslt.OutputSettings.Clone();
    writerSettings.CheckCharacters = false;

    using (var outputWriter = XmlWriter.Create(outputStream, writerSettings))
    {
        xslt.Transform(inputReader, arglist, outputWriter);
    }
}
GSerg
  • 76,472
  • 17
  • 159
  • 346
  • Thanks! I'll have to review what `ConformanceLevel = ConformanceLevel.Document` does to see if I need to add that to mine. – Nateous Feb 04 '15 at 16:04
  • 1
    @Nate My understanding was that it does the same as your `XmlNodeType.Document` parameter for the `XmlTextReader`'s constructor. – GSerg Feb 04 '15 at 16:05