1

I have a problem about XmlSerializer. In my huge XML file, there are some Null characters (\u0000) and so XmlSerializer (Deserializer) gives me an error. I found out that I need to set Normalization to false (via: https://msdn.microsoft.com/en-us/library/aa302290.aspx), so I tried this:

XmlSerializer deserializer = new XmlSerializer(typeof(T));
XmlTextReader reader = new XmlTextReader(filename);
reader.Normalization = false;
return (T)deserializer.Deserialize(reader);

I tried also second possibility, when I used XmlReader, because is also suggested by MSDN, and I tried to set CheckCharacters to false like this:

 XmlSerializer deserializer = new XmlSerializer(typeof(T));
 XmlReaderSettings settings = new XmlReaderSettings() { CheckCharacters = false }; 
 using (XmlReader reader = XmlReader.Create(filename, settings))
 {
       return (T)deserializer.Deserialize(reader);
 }

`

but both solutions give me the same result: InvalidOperationException on the line and column in XML where is the Null character.

Could you please give me an advice about that? I need to "load" the XML structure to my defined class. Without lines with these characters its working fine.

Thanks! :)

Edit: I forgot to say, that I've tried to load the content to a string and update the string, but inserted content is to big, so I get System.OutOfMemoryException and if I try to parse the file line by line, it's too slow. :(

dymanoid
  • 14,771
  • 4
  • 36
  • 64
  • this may be helpful.. https://stackoverflow.com/questions/306877/can-xmlserializer-deserialize-into-a-nullableint – JSR Aug 23 '17 at 08:51
  • Or, more likely, [Escape invalid XML characters in C#](https://stackoverflow.com/a/17735649). – dbc Aug 23 '17 at 08:58
  • Thanks! But all of these methods are based on loading the content to a string, but I have really huge file, so I get System.OutOfMemoryException. I tried this before. And if I try this with parsing it by line, it's to slow for my use. :( – Andrew Lerion Aug 23 '17 at 10:01
  • Does the file contain `NUL` codepoints, or does it contain entities invalid in XML (like ``)? In the former case, you can sidestep the issue by creating a `TextReader` that replaces the `NUL` characters with something else before the `XmlTextReader` sees them. If the entities themselves are invalid, that's a little too complicated (but then `CheckCharacters` should have taken care of that). Which class is throwing the exception, though -- `XmlTextReader` or `XmlSerializer`? – Jeroen Mostert Aug 23 '17 at 11:53
  • Also, *why* does it contain `NUL` characters, any clue? I'm guessing the file wasn't *produced* by an `XmlSerializer`, or was it? – Jeroen Mostert Aug 23 '17 at 12:01
  • @JeroenMostert Ondrej Svejdar helped me fixed it, but thank you also to you! :) It was like Blah blah NULL blah blah . I don't know how the file was produced, I just downloaded it. – Andrew Lerion Aug 28 '17 at 09:41

1 Answers1

0

You can go to reader level instead - subclass TextReader class to perform cleanup & fetch it to the XmlSerializer.

var deserializer = new XmlSerializer(typeof(T));
T instance;
using(var cleanupTextReader = new CleanupTextReader(reader)) {
  instance = deserializer.Deserialize(cleanupTextReader);
}

Where CleanupTextReader is something like:

internal sealed class CleanupTextReader : TextReader
{
    private TextReader _in;

    internal CleanupTextReader(TextReader t)
    {
        _in = t;
    }

    public override void Close()
    {
        _in.Close();
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            ((IDisposable) _in).Dispose();
        }
    }

    public override int Peek()
    {
        return _in.Peek();
    }

    public override int Read()
    {
        while(true)
        {
            var result = _in.Read();
            if (result != '\u0000')
            {
                return result;
            }
        }
    }

    private string CleanupString(string value)
    {
        if (string.IsNullOrEmpty(value) || value.IndexOfAny(new char['\u0000']) < 0)
        {
            return value;
        }
        var builder = new StringBuilder(value.Length);
        foreach (var ch in value)
        {
            if (ch != '\u0000')
            {
                builder.Append(ch);
            }
        }
        return builder.ToString();
    }

    private int CleanupBuffer(char[] buffer, int index, int count)
    {
        int adjustedCount = count;
        if (count > 0)
        {
            var readIndex = index;
            var writeIndex = index;
            while (readIndex < index + count)
            {
                var ch = buffer[readIndex];
                readIndex++;
                if (ch == '\u0000')
                {
                    adjustedCount--;
                }
                else
                {
                    buffer[writeIndex] = ch;
                    writeIndex++;
                }
            }
        }
        return adjustedCount;
    }

    public override int Read(char[] buffer, int index, int count)
    {
        while (true)
        {
            int reallyRead = _in.Read(buffer, index, count);
            if (reallyRead <= 0)
            {
                return reallyRead;
            }

            int cleanRead = CleanupBuffer(buffer, index, reallyRead);
            if (cleanRead != 0)
            {
                return cleanRead;
            }
        }
    }

    public override int ReadBlock(char[] buffer, int index, int count)
    {
        while (true)
        {
            int reallyRead = _in.ReadBlock(buffer, index, count);
            if (reallyRead <= 0)
            {
                return reallyRead;
            }

            int cleanRead = CleanupBuffer(buffer, index, reallyRead);
            if (cleanRead != 0)
            {
                return cleanRead;
            }
        }
    }

    public override string ReadLine()
    {
        return CleanupString(_in.ReadLine());
    }

    public override string ReadToEnd()
    {
        return CleanupString(_in.ReadToEnd());
    }
}
Ondrej Svejdar
  • 21,349
  • 5
  • 54
  • 89
  • This fails if any call to `Read` ever reads nothing but `NUL` characters, because your method will then return `0`, signaling that there are no more characters. – Jeroen Mostert Aug 23 '17 at 15:17
  • Thank you very much! :) I passed the StreamReader to the CleanupTextReader and it's working like a magic! :) – Andrew Lerion Aug 28 '17 at 09:37