I'm currently using Saxon to process Xquery in our .NET application. We're working with really big XML files (~2GB). When running the Xquery against one of these files using the Saxon binary file directly, the time it takes to complete the evaluation is around 2 minutes, but when doing the evaluation from my C# application the time elapsed increases to around 10 minutes, and I haven't yet been able to identify what I'm doing wrong.
This is what I'm doing when I run the Xquery using the Saxon binary file through the command line:
Query.exe -config:config.xml -q:XQueryTest.txt
These are the contents of the config.xml
:
<configuration xmlns="http://saxon.sf.net/ns/configuration" edition="HE">
<xquery defaultElementNamespace="http://www.irs.gov/efile"/>
</configuration>
And XQueryTest.txt
contains the Xquery we are going to process. When running the Xquery from the command line, we modify it to indicate the file we will run it against, using the doc()
function. Here is a sample line:
for
$ReturnData at $currentReturnDataPos in if(exists(doc("2GB.XML")/Return/ReturnData)) then doc("2GB.XML")/Return/ReturnData else element{'ReturnData'} {''}
As mentioned above, running this command, takes about 2 minutes to complete.
Now these is what I'm doing in my .NET application to make this same evaluation.
Processor processor = new Processor();
DocumentBuilder documentBuilder = processor.NewDocumentBuilder();
documentBuilder.IsLineNumbering = true;
documentBuilder.WhitespacePolicy = WhitespacePolicy.PreserveAll;
XQueryCompiler compiler = processor.NewXQueryCompiler();
string query = BuildXqueryString();
if (!String.IsNullOrEmpty(query))
{
XQueryExecutable executable = compiler.Compile(query);
XQueryEvaluator evaluator = executable.Load();
using (XmlReader myReader = XmlReader.Create(@"C:\Users\Administrator\Desktop\2GB.xml"))
{
evaluator.ContextItem = documentBuilder.Build(myReader);
}
var evaluations = evaluator.Evaluate();
}
The issue we have is in this line: evaluator.ContextItem = documentBuilder.Build(myReader)
. Which is not even the evaluation, but just the loading of the file. This line takes just too much time to execute, and I need to know if that is expected, or if there's a way to increase its speed. I have used all the different overloads of the Build()
method and they all take a lot of time to complete, way more than the 2 minutes that the execution takes when executing from the command line.
Regarding using the streaming capacity of Saxon to read the file by parts, because of the Xqueries we generate, that is not an option, as the Xquery can combine information in any part of the XML.