5

I have some very large XML files (800 MB to 1.5 GB). I need to apply XSLT on that. I am able to read it XMLTextReader. When i applied XSLT transformation, get SystemOutOfMemory Exception.

My code looks like;

static void Main(string[] args)
{
    XDocument newTree = new XDocument();
    XmlTextReader oReader = new XmlTextReader(@"C:\Projects\myxml.xml");


    using (XmlWriter writer = newTree.CreateWriter())
    {
        XslCompiledTransform oTransform = new XslCompiledTransform();
        oTransform.Load(@"C:\Projects\myXSLT.xsl");
        oTransform.Transform(oReader, writer);
    }
    Console.WriteLine(newTree);
}

Thanks in advance. It is very urgent. If I don't get any solution, I need to split XML into smaller XML and do transformation.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
jvm
  • 1,662
  • 8
  • 27
  • 46
  • 2
    Don't have a solution, but probably splitting up the large input file is the best you can do. It probably also depends to a large part on what you are doing inside your XSLT. – Dirk Vollmar Jun 23 '10 at 11:18

3 Answers3

7

XSLT uses XPath and this requires that the whole XML document be maintained in memory. Thus the problem of insufficient memory is by definition.

There are simle rules to approximate how much memory is needed and one of them says 5 * text-size.

So, for a "typical 1.5GB XML file" 8GB RAM may be sufficient.

Either split the document into smaller parts or wait for an implementation of XSLT 2.1, which defines special streaming instructions. In the meantime one may use the latest (commercial) version of Saxon, which implements extensions for streaming and successful processing of 64GB document has been reported on twitter.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • +1, however, XSLT 2.x in the context of .NET is probably something we can dream about forever. – Dirk Vollmar Jun 23 '10 at 14:03
  • @0xA3: Why not? There is Saxon.NET. – Dimitre Novatchev Jun 23 '10 at 14:31
  • Have you ever tried it? Saxon is great with Java, but terribly slow on .NET – Dirk Vollmar Jun 23 '10 at 16:11
  • @0xA3: Yes, it works pretty well.Not terribly slow -- may be 1.80 times slower than the Java version. One vould make it even faster if he NGEN-s the Saxon.NET binaries. – Dimitre Novatchev Jun 23 '10 at 16:23
  • I see. So that means, you can basically access any tag of the XML document from within any template in the XSLT? One consequence of this would be that you cannot split up the original XML document neither, except if you exactly know the semantics and where you're save to split it, because otherwise parts of the XML could be missing when a template tries to access a certain xpath. – chiccodoro Jun 24 '10 at 06:33
2

we are facing a similar problem. The solution we came uo with was to not use xslt for this case, and instead use Linq to Xml transformations while stteaming the data. You can leverage the c# yield keyword to iterate through an xml stream and tackle the file piecemeal this way. See streaming with linq to xml

the nature of xslt requires the xml to be loaded into memory. what needs to occur is you need to break down the large file into more managable pieces. if you use the xml streaming technique, you can break the document up into sub elements which you can then individually apply the xslt to. you may have to rewrite the xslt to accomodate this behavior.

Aside from this, the only other option is to throw more hardware at it, but this might even require an operating system upgrade depending on RAM limitations...

E Rolnicki
  • 1,677
  • 2
  • 17
  • 26
  • not possible in my case. need to apply a big xslt. Is there any XML file splitter tool available? – jvm Jun 23 '10 at 11:50
0

Don't know if it helps much, but here is some code I use to transform large files:

   XPathDocument myXPathDoc = new XPathDocument("xmfile.xml");
   XslCompiledTransform myXslTrans = new XslCompiledTransform() ;
   XsltSettings st = new XsltSettings(true, true);
   myXslTrans.Load("StyleSheet.xslt", st, null);
   StreamWriter s =new StreamWriter("output-fie.xslt");

   XsltArgumentList ln = new XsltArgumentList();
   // some xslt argument processing stuff            
   myXslTrans.Transform(myXPathDoc, ln, s);

It can take a while but it does seem to get the job done.

glenatron
  • 11,018
  • 13
  • 64
  • 112