I want to write an xml document to disk in a compact format. To this end, I use the net framework method XmlDictionaryWriter.CreateBinaryWriter(Stream stream,IXmlDictionary dictionary)
This method writes a custom compact binary xml representation, that can later be read by XmlDictionaryWriter.CreateBinaryReader
. The method accepts an XmlDictionary
that can contain common strings, so that those strings do not have to be printed in the output each time. Instead of the string, the dictionary index will be printed in the file. CreateBinaryReader
can later use the same dictionary to reverse the process.
However the dictionary I pass is apparently not used. Consider this code:
using System.IO;
using System.Xml;
using System.Xml.Linq;
class Program
{
public static void Main()
{
XmlDictionary dict = new XmlDictionary();
dict.Add("myLongRoot");
dict.Add("myLongAttribute");
dict.Add("myLongValue");
dict.Add("myLongChild");
dict.Add("myLongText");
XDocument xdoc = new XDocument();
xdoc.Add(new XElement("myLongRoot",
new XAttribute("myLongAttribute", "myLongValue"),
new XElement("myLongChild", "myLongText"),
new XElement("myLongChild", "myLongText"),
new XElement("myLongChild", "myLongText")
));
using (Stream stream = File.Create("binaryXml.txt"))
using (var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, dict))
{
xdoc.WriteTo(writer);
}
}
}
The produced output is this (binary control characters not shown)
@
myLongRootmyLongAttribute˜myLongValue@myLongChild™
myLongText@myLongChild™
myLongText@myLongChild™
myLongText
So apparently the XmlDictionary has not been used. All strings appear in their entirety in the output, even multiple times.
This is not a problem limited to XDocument. In the above minimal example I used a XDocument to demonstrate the problem, but originally I stumbled upon this while using XmlDictionaryWriter in conjunction with a DataContractSerializer, as it is commonly used. The results were the same:
[Serializable]
public class myLongChild
{
public double myLongText = 0;
}
...
using (Stream stream = File.Create("binaryXml.txt"))
using (var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, dict))
{
var dcs = new DataContractSerializer(typeof(myLongChild));
dcs.WriteObject(writer, new myLongChild());
}
The resulting output did not use my XmlDictionary.
How can I get XmlDictionaryWriter to use the suplied XmlDictionary?
Or have I misunderstood how this works?
with the DataContractSerializer approach, I tried debugging the net framework code (visual studio/options/debugging/enable net. framework source stepping). Apparently the Writer does attempt to lookup each of the above strings in the dictionary, as expected. However the lookup fails in line 356 of XmlbinaryWriter.cs, for reasons that are not clear to me.
Alternatives I have considered:
There is an overload for XmlDictionaryWriter.CreatebinaryWriter, that also accepts a XmlBinaryWriterSession. The writer then adds any new strings it encounters into the session dictionary. However, I want to only use a static dictionary for reading and writing, which is known beforehand.
I could wrap the whole thing into a
GzipStream
and let the compression take care of the multiple instances of strings. However, this would not compress the first instance of each string, and seems like a clumsy workaround overall.