2

I am reading this file and using the System.Xml.Serialization.XmlSerializer to serialize and deserialize. Sorry I am unable to post the contents of the file in this question as StackOverflow is encoding them incorrectly.

The Deserialize function is throwing an exception.

'', hexadecimal value 0x03, is an invalid character. Line 5, position 20.

What am I doing wrong?

Here is my code. Also Available here

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml.Serialization;

namespace XMLTester
{
    public class HexTester
    {
        public static void Main(string[] args)
        {

            var lines = File.ReadAllLines(@"..\..\HexText.txt"); ;

            var foo = new Foo()
            {
                Items = new List<FooBar>()
                {
                    new FooBar()
                    {                            
                        Text = lines[0]                        
                    }
                }
            };

            string xml = SerializeToXML(foo);
            var objTabs = DeserializeFromXML(xml, typeof(Foo)); //This throws an error
        }

        static string SerializeToXML(object obj)
        {
            StringBuilder xml = new StringBuilder();
            XmlSerializer serializer = new XmlSerializer(obj.GetType());
            TextWriter textWriter = new StringWriter(xml);
            serializer.Serialize(textWriter, obj);
            textWriter.Close();

            return xml.ToString();
        }

        static object DeserializeFromXML(string xml, Type toType)
        {
            XmlSerializer deserializer = new XmlSerializer(toType);
            TextReader textReader = new StringReader(xml);

            Object obj = deserializer.Deserialize(textReader); //This throws an error
            textReader.Close();

            return obj;
        }

    }

    public class Foo
    {
        public List<FooBar> Items { get; set; }
    }
    public class FooBar
    {
        public string Text { get; set; }
    }
}
Praveen Reddy
  • 7,295
  • 2
  • 21
  • 43
  • what's the point of serializing a text file? – Lei Yang May 10 '17 at 01:13
  • I just get object Foo passed to my function. I gave the example to read from the file as StackOverflow was not allowing me to post the characters directly. – Praveen Reddy May 10 '17 at 01:29
  • i tested your code. i think the issue lies in the **file**. if you download the file directly you'll see invisible characters. if I copy the contents from google drive then the program has no exceptions. – Lei Yang May 10 '17 at 02:08
  • I know that the characters in the file are issue. That's the reason I posted this question. I dont want to use the characters that Google Drive is showing. I want to use the original file. Do you think encoding the characters differently will work? – Praveen Reddy May 10 '17 at 02:11
  • do you mean you want to serialize any binary file? i think you can convert to and from base64 encoding. – Lei Yang May 10 '17 at 02:14

1 Answers1

1

0x03 (End-of-Text character) is simply not an allowed character in XML.

So you will need to clean the invalid characters before you serialize or deserialize.

Community
  • 1
  • 1
stomtech
  • 454
  • 4
  • 10
  • Do you think encoding the characters differently will work? – Praveen Reddy May 10 '17 at 02:11
  • I agree 0x03 is not an allowed character in XML. But I am looking for a way to Serialize that into a valid character. Do you you have any inputs on that? Can I encode the string differently before Serializing? – Praveen Reddy May 12 '17 at 14:52
  • @CodingDawg : How about getting the serialized xml text to be enclosed in a [CDATA](https://en.wikipedia.org/wiki/CDATA) section (_like <![CDATA[invalid_text]]>_) with [XmlDocument.CreateCDataSection](https://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createcdatasection.aspx). Have a look at [serialize a string as CDATA using XmlSerializer](http://stackoverflow.com/a/1380050/7988857). – stomtech May 12 '17 at 18:24
  • The input that I receive is dynamic. enclosing in the CDATA section will not be viable. – Praveen Reddy May 12 '17 at 18:28
  • @CodingDawg : How about [converting the text to a base64 string](http://stackoverflow.com/a/8170810/7988857) before serializing and after you deserialize you just [Convert.FromBase64String](http://stackoverflow.com/a/7134853/7988857) back to regular text. – stomtech May 12 '17 at 18:34