1

I get the error specified in the title.¨Notice I have used the XmlConvert.IsXmlChar as suggested in other thread however I still get the error.

XDocument GetXDocument(string fileName, CloudBlobDirectory folder)
{
    CloudBlockBlob blob = folder.GetBlockBlobReference(fileName);

    using (var memoryStream = new MemoryStream())
    {   
        blob.DownloadToStream(memoryStream);
        memoryStream.Seek(0, SeekOrigin.Begin);

        StreamReader sr = new StreamReader(memoryStream, Encoding.UTF8);
        var xml = sr.ReadToEnd();
        var validXmlChars = xml.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray();
        xml = new string(validXmlChars);

        return XDocument.Parse(xml);
    }
}

This is the line where it fails:

<Property Name="Printkode" Type="String" Access="ReadWrite" Value="%% d2m*DOKSTART|&#xB;d2m*OVERSKRIFT:&quot;xx&quot;|&#xB;d2m*CPR:&quot;xxx&quot;" />
Thomas Segato
  • 4,567
  • 11
  • 55
  • 104
  • XML readers in .Net can't read invalid XMLs. Fixing text to be valid XML before parsing is your only option. Note that you need to fix actual invalid characters (like "\u000b") as well as escaped once shown in the post "". – Alexei Levenkov May 17 '21 at 18:11
  • The individual characters `` are all valid, but together they represent a single vertical tab which is not valid. – juharr May 17 '21 at 18:11
  • What does the `Value` attribute represent? You should consider a different encoding. – Dour High Arch May 17 '21 at 18:20
  • I there any built in things that can remove invalid parts. Encoding is same as specified in xml in top. – Thomas Segato May 17 '21 at 18:25
  • How do you know all vertical invalid tabs? – Thomas Segato May 17 '21 at 19:01
  • See [this answer](https://stackoverflow.com/a/28152666/2557128). It appears .Net 4.5 (and I assume .Net Core) implement XML 1.0 and not 1.1. – NetMage May 18 '21 at 20:22

2 Answers2

0

Since there is no answer yet I would like to give one approach. The main key of my approach is using CDATA, where you simply mark a passage as charachter data which won't need deserialization. I build a simple XMl based on your snippet:

<?xml version="1.0" encoding="utf-8" ?> 
   <Property Name="Printkode" Type="String" Access="ReadWrite">
      <Value>
          <![CDATA[%% d2m*DOKSTART|&#xB;d2m*OVERSKRIFT: 
          &quot;xx&quot;|&#xB;d2m*CPR:&quot;xxx&quot;]]>
      </Value>
   </Property>

This file can now be deserialized to a XDocument like in your code:

public void DeserializeXML()
    {
        using var file = File.Open(@"C:\XMLFile1.xml", FileMode.Open);
        StreamReader sr = new StreamReader(file, Encoding.UTF8);
        var xml = sr.ReadToEnd();
        var result =XDocument.Parse(xml);
    }

Which results to following:

Debug results of the deserialized xml

This way you would be able to get in to your XDocument. However, the challenge then would be to implement all the CDATA sections where it is needed. This totally depends on your situation and your given XML's. I hope it helped a little bit!

Patrick
  • 387
  • 3
  • 15
  • I have no control over the xml generation. It is produces by a Microsoft product. – Thomas Segato May 17 '21 at 19:38
  • Nevertheless, it should be possible to read it as a string first, identify all sections with troubling hexadecimal values, put them into CDATA sections and then deserialize it to a XDocument. – Patrick May 17 '21 at 19:41
  • But the problem is I cant pass them as half of the document cannot be passed because of this error. I am starting to consider to use regex to remove all property nodes. – Thomas Segato May 17 '21 at 19:42
  • Well, I actually thought you want to find a way to pass it further and it's information. If the property node is completely irrelevant for you, which wasn't stated anywhere, there is surely a regex way for it. – Patrick May 17 '21 at 19:50
0

You could double test for special escaped characters by decoding the escape characters:

var xs = String.Concat(s.Where(ch => XmlConvert.IsXmlChar(ch)));
xs = xs.Replace(new Regex(@"&.+?;"), m => XmlConvert.IsXmlChar(WebUtility.HtmlDecode(m.Value)[0]) ? m.Value : "");
NetMage
  • 26,163
  • 3
  • 34
  • 55