1

I have been working on some scripts to convert a large amount of XML data from format 1 to format 2 to allow data to migrate betweens systems. I'm using Python 3.8 on Windows 10.

This is a once off job. There has been a huge amount of data incompatibility that I've have to reverse engineer on both systems to make the data compatible and manually translate most of the XML fields. Learning XLST was too big a curve for a single job and I don't have SQL experience to do it.

All was going well until the output string reached I think about 86MB (the limit may be quite a bit less than this but it was the first file to fail).

I have built the XML using xml.etree.Elemtree.

I need the XML output pretty printed and have borrowed a prettify function I found on stackoverflow that uses minidom: Use xml.etree.elementtree to print nicely formatted xml files - copied here:

def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = tostring(elem, 'ISO-8859-1')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

My write to file function:

def write_to_file(root_xml,filenumber):
# Simply write the XML to the output folder
    file = open(outputxml + "\\" + filenumber + ".xml","w",encoding="UTF-8")
    file.write(prettify(root_xml))
    file.close
    return

My error: File "C:\\mycode.py", line 501, in write_to_file file.write(prettify(root_xml)) MemoryError

I've read that minidom isn't a great way of handling my data and that I probably should not be creating my entire XML in memory. My upcoming biggest XML will probably be about 250MB, maybe even higher and I'm failing to write a string of 86MB. It seems to be a simple issue, however I'm stuck.

Is there a good workaround for this? I'm really hoping to not have to re-engineer a lot of code to write the XML output in chunks. Is there an easy way to break up the string into smaller pieces and then write to file? Other ideas?

Thanks!

Hot Pot
  • 11
  • 1

1 Answers1

0

In case others have a similar issue, I stumbled into an answer that works for me which only occured due to me getting hung up on using "prettify" to write to a file. I stopped using prettify and recast the XML data into ElementTree and can write large XML files no problem (well at least a few hundred MB):

    tree = ET.ElementTree(root_xml)
    tree.write("myxmlfile.xml")

I didn't need my pretty printing in the end.

Happy for any code geniuses to give feedback/criticisms or suggestions.

Hot Pot
  • 11
  • 1