I am using xsl file to merge multiple xml files. The number of files is around 100 and each file has 4000 nodes. The example xml and xsl are available here in this SO question
My xmlmerge.py is as follows:
import lxml.etree as ET
import argparse
import os
ap = argparse.ArgumentParser()
ap.add_argument("-x", "--xmlreffile", required=True, help="Path to list of xmls")
ap.add_argument("-s", "--xslfile", required=True, help="Path to the xslfile")
args = vars(ap.parse_args())
dom = ET.parse(args["xmlreffile"])
xslt = ET.parse(args["xslfile"])
transform = ET.XSLT(xslt)
newdom = transform(dom)
print(ET.tostring(newdom, pretty_print=True))
I am writing the output of the python to a xmlfile...so my code to run the python script is as follows:
python xmlmerge.py --xmlreffile ~/Documents/listofxmls.xml --xslfile ~/Documents/xslfile.xsl
For 100 files when I print the output on a console, it takes around 120 minutes however, if I try to save the same output in a xml file
python xmlmerge.py --xmlreffile ~/Documents/listofxmls.xml --xslfile ~/Documents/xslfile.xsl >> ~/Documents/mergedxml.xml
This takes around 3 days but yet the merge is not over. I was not sure if the machine is hung and hence tried with just 8 files on a different machine, and it had taken more than 4 hours but still the merge is not complete. I don't know why it takes so much of time when I write to the file but not when I am printing on to the console. Can someone guide me?
I am using Ubuntu 14.04, python 2.7.