My python lxml tree expands to 5GB when I serialize it with the toString() method. The Linux OS kills the process because it runs out of memory.
Technically there no need to create the complete xml in memory since its written to a zip archive right away.
Is there a way to serialize the tree as a stream to a zip archive?
Here is my current code (snippet):
import zipfile
from lxml import etree as ET
# Create a zipfile archive
zip_out = zipfile.ZipFile('outputfile.zip', 'w', compression=zipfile.ZIP_DEFLATED)
# serialize lxml etree to string and write to archive
zip_out.writestr('treefile.xml', large_etree.tostring())
One way could be to write the etree to a tmp file and then write that file to the archive. Not a great workaround and probably also slow.