0

I know this was asked before for many times but still don't see a good solution.
There is an object like this:

public class DTO
{
    public string Value;
}

I need to serialize it in the Exporter app and then deserialize in the Importer.
Object's Value may contain characters who are not valid for XML (e.x. 0x8). I need to either let Exporter remove such chars or let Importer successfully load object containing the chars. I wouldn't like to clean up objects before serialization because I have tens of them with tens string properties each.

  1. Importer side. If I enable CheckCharacters here then I'll get error on serialization step. I don't see a way to custom control all strings at one spot. If I disable it then the XML will contain invalid char.

    XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { CheckCharacters = false };
    XmlSerializer xmlSerializer = new XmlSerializer(typeof(DTO));
    StringBuilder sb = new StringBuilder();
    DTO dto = new DTO { Value = Convert.ToChar(0x08).ToString() };
    
    using (XmlWriter xmlWriter = XmlWriter.Create(sb, xmlWriterSettings))
    {
        xmlSerializer.Serialize(xmlWriter, dto); 
        xmlWriter.Flush();
        xmlWriter.Close();
    }
    
  2. Ok, if I let invalid char go to XML then there is no way to handle it on Import side. Even if CheckCharacters = false, the error occurs on Deserialize() call:

    var _reader = XmlReader.Create(File.OpenText(path), new XmlReaderSettings() { CheckCharacters = false });
    _reader.MoveToContent();
    var outerXml = _reader.ReadOuterXml();
    xmlSerializer.Deserialize(new StringReader(outerXml)); <== getting error here
    

Is there a way to remove invalid chars in either step and let the object exported/imported without errors?

LINQ2Vodka
  • 2,996
  • 2
  • 27
  • 47

2 Answers2

1

That was my bad :(
In here:

var outerXml = _reader.ReadOuterXml();
xmlSerializer.Deserialize(new StringReader(outerXml)); <== getting error here

xmlSerializer was actually using an implicitly created internal XmlReader which did check characters. All I had to do four hours ago was:

xmlSerializer.Deserialize(_reader);
LINQ2Vodka
  • 2,996
  • 2
  • 27
  • 47
0

I'm not saying this is a great solution but code below will remove non UTF8 characters when serializing :

    public class DTO
    {
        private string _value { get; set; }
        public string Value
        {
            get { return Encoding.UTF8.GetString(_value.Select(x => (byte)((int)x)).ToArray()); }
            set { _value = value; }
        }

    }
jdweng
  • 33,250
  • 2
  • 15
  • 20