6

So I have some XML:

<key>my tag</key><value>my tag value &#xB;and my invalid Character</Value>

and an XMLReader:

using (XmlReader reader = XmlReader.Create(new StringReader(xml)))
{
     while (reader.Read())
     {
         //do my thing
     }
}

I have implemented the CleanInvalidCharacters method from here but as the "&#xB" is not yet encoded it doesn't get removed.

The error is being thrown at the reader.Read(); line with exception:

hexadecimal value 0x0B, is an invalid character.

Community
  • 1
  • 1
JKennedy
  • 18,150
  • 17
  • 114
  • 198

1 Answers1

10

The problem is that you don't have XML -- you have some string that sure looks like XML but unfortunately doesn't really qualify. Fortunately you can tell XmlReader to be more lenient:

using (XmlReader reader = XmlReader.Create(new StringReader(xml), new XmlReaderSettings { CheckCharacters = false }))
{
     while (reader.Read())
     {
         //do my thing
     }
}

Note that you will still end up with XML that, when serialized, might produce problems further down the line, so you may wish to filter the characters out afterwards anyway as you're reading it.

Jeroen Mostert
  • 27,176
  • 2
  • 52
  • 85
  • How would I filter out values further down the line? Would I do it in the while loop. Encode it as xml and remove invalid characters? – JKennedy Oct 14 '14 at 10:19
  • You can use the `CleanInvalidCharacters` approach mentioned in your original post on the text nodes, element and attribute values (as you encounter them in the while loop, indeed). It will work now since the characters have already been decoded. – Jeroen Mostert Oct 14 '14 at 10:23
  • perfect , i was trying too many things in last 2 hours without any luck – AhammadaliPK Jul 28 '20 at 08:27