4

I have a situation where I am generating a XML file to be submitted to a webservice, sometimes due to the amount of data it exceeds 30mb or 50mb.

I need to compress the file, using c#, .net framework 4.0, rather one of the nodes which has most of the data.. I have no idea how i am going to do it .. is it possible if someone can give me a example of how I can get this done please.

the xml file looks like this

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<HeaderTalk xmlns="http://www.w3schools.com/xml">
<EnvelopeVersion>2.0</EnvelopeVersion>
<Header>
<MessageDetails>
  <Class>CHAR-CLM</Class>      
</MessageDetails>
<SenderDetails>
  <IDAuthentication>
    <SenderID>aaaaaa</SenderID>
    <Authentication>
      <Method>MD5</Method>
      <Role>principal</Role>
      <Value>a3MweCsv60kkAgzEpXeCqQ==</Value>
    </Authentication>
  </IDAuthentication>
  <EmailAddress>Someone@somewhere.com</EmailAddress>
</SenderDetails>
</Header>
<TalkDetails>
  <ChannelRouting>
   <Channel>
     <URI>1953</URI>
     <Product>My product</Product>
     <Version>2.0</Version>
    </Channel>
</ChannelRouting>
</TalkDetails>
<Body>
   <envelope xmlns="http://www.w3schools.com/xml/">       
     <PeriodEnd>2013-08-13</PeriodEnd>
     <IRmark Type="generic">zZrxvJ7JmMNaOyrMs9ZOaRuihkg=</IRmark>
     <Sender>Individual</Sender>
     <Report>
       <AuthOfficial>
          <OffName>
            <Fore>B</Fore>
            <Sur>M</Sur>
          </OffName>
          <Phone>0123412345</Phone>
        </AuthOfficial>
    <DefaultCurrency>GBP</DefaultCurrency>
    <Claim>
      <OrgName>B</OrgName>
      <ref>AB12345</ref>
      <Repayment>
        <Account>
          <Donor>
            <Fore>Barry</Fore>
           </Donor>
            <Total>7.00</Total>              
        </Account>           
        <Account>
          <Donor>
            <Fore>Anthony</Fore>               
          </Donor>             
          <Total>20.00</Total>
        </Account>                  
      </Repayment>
      </Claim>
      </Report>
   </envelope>
 </Body>
</HeaderTalk>

The CLAIM node is what I want to Compress , as it can be Millions of records that get included in the XML.

I am a novice in coding, it has taken a long time for me to get this XML generated, and been searching to find a way to compress the node but I just cant get it to work.. the Result needs to be exactly same till the DefaultCurrency node.. and then

 </AuthOfficial>
 <DefaultCurrency>GBP</DefaultCurrency>
 <CompressedPart Type="zip">UEsDBBQAAAAIAFt690K1</CompressedPart>
 </Report>
 </envelope>
 </Body>
 </HeaderTalk>

or

 </AuthOfficial>
 <DefaultCurrency>GBP</DefaultCurrency>
 <CompressedPart Type="gzip">UEsDBBQAAAAIAFt690K1</CompressedPart>
 </Report>
 </envelope>
 </Body>
 </HeaderTalk>

Thank you everyone in advance please. Or if someone can suggest where I can look and get some idea, on what I want to do.

to create the file , I am simple iterating through a Dataset and Writing the nodes using XmlElements and setting innertexts to my values ..

The Code I have used to write is .. //claim

XmlElement GovtSenderClaim = xmldoc.CreateElement("Claim");
XmlElement GovtSenderOrgname = xmldoc.CreateElement("OrgName");
GovtSenderOrgname.InnerText = Charity_name;
GovtSenderClaim.AppendChild(GovtSenderOrgname);

 XmlElement GovtSenderHMRCref = xmldoc.CreateElement("ref");
 GovtSenderHMRCref.InnerText = strref ;
 GovtSenderClaim.AppendChild(GovtSenderref);

 XmlElement GovtSenderRepayments = xmldoc.CreateElement("Repayment");
 while (reader.Read())
 {
  XmlElement GovtSenderAccount = xmldoc.CreateElement("Account");
  XmlElement GovtSenderDonor = xmldoc.CreateElement("Donor");

   XmlElement GovtSenderfore = xmldoc.CreateElement("Fore");
   GovtSenderfore.InnerText = reader["EmployeeName_first_name"].ToString();
   GovtSenderDonor.AppendChild(GovtSenderfore);

   GovtSenderAccount .AppendChild(GovtSenderDonor);

   XmlElement GovtSenderTotal = xmldoc.CreateElement("Total");
   GovtSenderTotal.InnerText = reader["Total"].ToString();

   GovtSenderAccount .AppendChild(GovtSenderTotal);

   GovtSenderRepayments.AppendChild(GovtSenderAccount );
 }
  GovtSenderClaim.AppendChild(GovtSenderRepayments);


   GovtSenderReport.AppendChild(GovtSenderClaim);

and the rest of the nodes to close..

user2664502
  • 85
  • 2
  • 2
  • 5
  • Can you modify the web service? – Dustin Kingen Aug 13 '13 at 14:55
  • If the web service supports `gzip` or `deflate` encoding, you might be able to send the file compressed without needing to change the data. – Gene Aug 13 '13 at 15:00
  • Back of the envelope calculation says that gzip can make your 50 MB into about 10 MB (possibly more, and not likely less), but then you have to base64 encode it, which will increase the size to about 13.5 MB. Is that good enough savings? – Jim Mischel Aug 13 '13 at 15:06
  • Hi Romoku.. this is a third party service so No I cant modify the webservice. – user2664502 Aug 13 '13 at 15:14

4 Answers4

2

I need to compress the file, using c#, .net framework 4.0, rather one of the nodes

You can use GZip compression. Something like

public static void Compress(FileInfo fileToCompress)
        {
            using (FileStream originalFileStream = fileToCompress.OpenRead())
            {
                if ((File.GetAttributes(fileToCompress.FullName) & FileAttributes.Hidden) != FileAttributes.Hidden & fileToCompress.Extension != ".gz")
                {
                    using (FileStream compressedFileStream = File.Create(fileToCompress.FullName + ".gz"))
                    {
                        using (GZipStream compressionStream = new GZipStream(compressedFileStream, CompressionMode.Compress))
                        {
                            originalFileStream.CopyTo(compressionStream);
                            Console.WriteLine("Compressed {0} from {1} to {2} bytes.",
                                fileToCompress.Name, fileToCompress.Length.ToString(), compressedFileStream.Length.ToString());
                        }
                    }
                }
            }
        }

        public static void Decompress(FileInfo fileToDecompress)
        {
            using (FileStream originalFileStream = fileToDecompress.OpenRead())
            {
                string currentFileName = fileToDecompress.FullName;
                string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

                using (FileStream decompressedFileStream = File.Create(newFileName))
                {
                    using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
                    {
                        decompressionStream.CopyTo(decompressedFileStream);
                        Console.WriteLine("Decompressed: {0}", fileToDecompress.Name);
                    }
                }
            }
        }

Another possible way of doing is by Deflate. See here. The major difference between GZipStream and Deflate stream will be that GZipStream will add CRC to ensure the data has no error.

Community
  • 1
  • 1
Ehsan
  • 31,833
  • 6
  • 56
  • 65
  • Why not `Deflate`? It's normally better. – xanatos Aug 13 '13 at 14:54
  • @xanatos there are many ways. Added link to Deflate as well at the bottom :) – Ehsan Aug 13 '13 at 14:57
  • Hi Ehsan But this one will compress the whole xml file , I want just one NODE in the xml file to be compressed. – user2664502 Aug 13 '13 at 15:07
  • 1) The .NET GZip is just Deflate with an added header of a few bytes, so the size isn't much different. 2) The OP is asking how to compress part of the file, replacing the Claim node(s) with a CompressedPart node. That's much different than compressing the entire file as you suggest. – Jim Mischel Aug 13 '13 at 15:09
2

You can try this: it will compress only the nodes you select. It's a little different from what you asked, because it will replace the content of the element, leaving the element + its attributes as they were.

{
    // You are using a namespace! 
    XNamespace ns = "http://www.w3schools.com/xml/";

    var xml2 = XDocument.Parse(xml);

    // Compress
    {
        // Will compress all the XElement that are called Claim
        // You should probably select the XElement in a better way
        var nodes = from p in xml2.Descendants(ns + "Claim") select p;

        foreach (XElement el in nodes)
        {
            CompressElementContent(el);
        }
    }

    // Decompress
    {
        // Will decompress all the XElement that are called Claim
        // You should probably select the XElement in a better way
        var nodes = from p in xml2.Descendants(ns + "Claim") select p;

        foreach (XElement el in nodes)
        {
            DecompressElementContent(el);
        }
    }
}

public static void CompressElementContent(XElement el)
{
    string content;

    using (var reader = el.CreateReader())
    {
        reader.MoveToContent();
        content = reader.ReadInnerXml();
    }

    using (var ms = new MemoryStream())
    {
        using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Compress))
        {
            // So that the BOM isn't written we use build manually the encoder.
            // See for example http://stackoverflow.com/a/2437780/613130
            // But note that false is implicit in the parameterless constructor
            using (StreamWriter sw = new StreamWriter(defl, new UTF8Encoding()))
            {
                sw.Write(content);
            }
        }

        string base64 = Convert.ToBase64String(ms.ToArray());

        el.ReplaceAll(new XText(base64));
    }
}

public static void DecompressElementContent(XElement el)
{
    var reader = el.CreateReader();
    reader.MoveToContent();
    var content = reader.ReadInnerXml();

    var bytes = Convert.FromBase64String(content);

    using (var ms = new MemoryStream(bytes))
    {
        using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Decompress))
        {
            using (StreamReader sr = new StreamReader(defl, Encoding.UTF8))
            {
                el.ReplaceAll(ParseXmlFragment(sr));
            }
        }
    }
}

public static IEnumerable<XNode> ParseXmlFragment(StreamReader sr)
{
    var settings = new XmlReaderSettings
    {
        ConformanceLevel = ConformanceLevel.Fragment
    };

    using (var xmlReader = XmlReader.Create(sr, settings))
    {
        xmlReader.MoveToContent();

        while (xmlReader.ReadState != ReadState.EndOfFile)
        {
            yield return XNode.ReadFrom(xmlReader);
        }
    }
}

The decompress is quite complex, because it's difficult to replace the content of an Xml. In the end I split the content XNode by Xnode in ParseXmlFragment and ReplaceAll in DecompressElementContent.

As a sidenote, you have two similar-but-different namespaces in you XML: http://www.w3schools.com/xml and http://www.w3schools.com/xml/

This other variant will do exactly what you asked (so it will create a CompressedPart node) minus the attribute with the type of compression.

{
    XNamespace ns = "http://www.w3schools.com/xml/";

    var xml2 = XDocument.Parse(xml);

    // Compress
    {
        // Here the ToList() is necessary, because we will replace the selected elements
        var nodes = (from p in xml2.Descendants(ns + "Claim") select p).ToList();

        foreach (XElement el in nodes)
        {
            CompressElementContent(el);
        }
    }

    // Decompress
    {
        // Here the ToList() is necessary, because we will replace the selected elements
        var nodes = (from p in xml2.Descendants("CompressedPart") select p).ToList();

        foreach (XElement el in nodes)
        {
            DecompressElementContent(el);
        }
    }
}

public static void CompressElementContent(XElement el)
{
    string content = el.ToString();

    using (var ms = new MemoryStream())
    {
        using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Compress))
        {
            // So that the BOM isn't written we use build manually the encoder.
            using (StreamWriter sw = new StreamWriter(defl, new UTF8Encoding()))
            {
                sw.Write(content);
            }
        }

        string base64 = Convert.ToBase64String(ms.ToArray());

        var newEl = new XElement("CompressedPart", new XText(base64));
        el.ReplaceWith(newEl);
    }
}

public static void DecompressElementContent(XElement el)
{
    var reader = el.CreateReader();
    reader.MoveToContent();
    var content = reader.ReadInnerXml();

    var bytes = Convert.FromBase64String(content);

    using (var ms = new MemoryStream(bytes))
    {
        using (DeflateStream defl = new DeflateStream(ms, CompressionMode.Decompress))
        {
            using (StreamReader sr = new StreamReader(defl, Encoding.UTF8))
            {
                var newEl = XElement.Parse(sr.ReadToEnd());
                el.ReplaceWith(newEl);
            }
        }
    }
}
Patrick from NDepend team
  • 13,237
  • 6
  • 61
  • 92
xanatos
  • 109,618
  • 12
  • 197
  • 280
1

What you're asking is possible, but somewhat involved. You'll need to create the compressed node in memory and then write it. I don't know how you're writing your XML, so I'll assume that you have something like:

open xml writer
write <MessageDetails>
write <SenderDetails>
write other nodes
write Claim node
write other stuff
close file

To write your claim node, you'll want to write to an in-memory stream and then base64 encode it. The resulting string is what you write to the file as your <CompressedPart>.

string compressedData;
using (MemoryStream ms = new MemoryStream())
{
    using (GZipStream gz = new GZipStream(CompressionMode.Compress, true))
    {
        using (XmlWriter writer = XmlWriter.Create(gz))
        {
            writer.WriteStartElement("Claim");
            // write claim stuff here
            writer.WriteEndElement();
        }
    }
    // now base64 encode the memory stream buffer
    byte[] buff = ms.GetBuffer();
    compressedData = Convert.ToBase64String(buff, 0, buff.Length);
}

Your data is then in the compressedData string, which you can write as element data.

As I said in my comment, GZip will typically take 80% off your raw XML size, so that 50 MB becomes 10 MB. But base64 encoding will add 33% to the compressed size. I'd expect the result to be approximately 13.5 MB.

Update

Based on your additional code, what you're trying to do doesn't look too difficult. I think what you want to do is:

// do a bunch of stuff
GovtSenderClaim.AppendChild(GovtSenderRepayments);

// start of added code

// compress the GovtSenderClaim element
// This code writes the GovtSenderClaim element to a compressed MemoryStream.
// We then read the MemoryStream and create a base64 encoded representation.
string compressedData;
using (MemoryStream ms = new MemoryStream())
{
    using (GZipStream gz = new GZipStream(CompressionMode.Compress, true))
    {
        using (StreamWriter writer = StreamWriter(gz))
        {
            GovtSenderClaim.Save(writer);
        }
    }
    // now base64 encode the memory stream buffer
    byte[] buff = ms.ToArray();
    compressedData = Convert.ToBase64String(buff, 0, buff.Length);
}

// compressedData now contains the compressed Claim node, encoded in base64.

// create the CompressedPart element
XElement CompressedPart = xmldoc.CreateElement("CompressedPart");
CompressedPart.SetAttributeValue("Type", "gzip");
CompressedPart.SetValue(compressedData);

GovtSenderReport.AppendChild(CompressedPart);
// GovtSenderReport.AppendChild(GovtSenderClaim);
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Hi Jim , I am using the normal Xmlelemnt, create element, appendchild etc I have edited the question and added the code I have used.. Thanks – user2664502 Aug 15 '13 at 10:19
  • Hi Jim, The only problem is that I am not using the Xelement when creating the file .. doing it the old way. so dont have the SAVE. for xmlElement. – user2664502 Aug 15 '13 at 14:45
  • @user2664502: That call to `Save` is writing the element to the memory stream. That's what compresses it. The code then reads that memory stream and creates the `compressedData` string, which is written to the `CompressedPart` element. I'll add some comments to the example. – Jim Mischel Aug 15 '13 at 14:48
0

This is what I have done to make it work ..

public void compressTheData(string xml)
{
  XNamespace ns =  "http://www.w3schools.com/xml/";
  var xml2 = XDocument.Load(xml);   

  // Compress
  {
   var nodes = (from p in xml2.Descendants(ns + "Claim") select p).ToList();
    foreach (XElement el in nodes)
    {      
        CompressElementContent(el);           
    }
}
xml2.Save(xml);   
}


public static void CompressElementContent(XElement el)
{
  string content = el.ToString();    

  using (var ms = new MemoryStream())
  {
    using (GZipStream defl = new GZipStream(ms, CompressionMode.Compress))
    {           
        using (StreamWriter sw = new StreamWriter(defl))
        {
            sw.Write(content); 
        }
    }
    string base64 = Convert.ToBase64String(ms.ToArray());  
    XElement newEl = new XElement("CompressedPart", new XText(base64));
    XAttribute attrib = new XAttribute("Type", "gzip");
    newEl.Add(attrib);
    el.ReplaceWith(newEl);
  }
 }

Thank you everyone for your inputs.

user2664502
  • 85
  • 2
  • 2
  • 5