Please, Note: Novice user of Python.
Hi,
I am working with more than 1Gb of XML file. Using Python2.7. Initially, I was using 'iter' to parse the XML. It worked fine with small files but with file such big I was getting a memory error. Then, I read the documentation and found out that iter load the whole file into memory at once and I should use iterparse. I used and able to load the xml file and make modification while I parse it.
The problem I am facing now is how to write this parsed element tree into a file. The methods I found on Google were suggesting 'write' method of ElementTree which was parsed using 'iter' but mine is parsed using iterparse.
Below is my code snippet. I had commented lines because inner logic of code is pretty big. The only part where I am struggling is writing the updated tree into 'output_pre' file.
The structure of my xml file is like this:
<users>
<user pin=''>
</user>
<user pin=''>
</user>
</users>
Code(inner logic has been removed):
----------------Parser---------------------------
import xml.etree.cElementTree as ET2
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element
output_pre = open("pre_ouput.xml", 'w')
tree = ET2.iterparse("temp-output-preliminary.xml")
for event, elem in tree:
if elem.tag == "users":
pass
if elem.tag == "user":
userContent = list(elem)
#Number of children will help filter dummy users in user-state file.
numberOfChildren = len(userContent)
#assert numberOfChildren != 3
PIN = elem.get('pin')
assert PIN is not None
analysing += 1
logger.info ("Analysing user number: %d", analysing)
if numberOfChildren <= 2:
if numberOfChildren >=4:
if numberOfChildren == 3:
for e in ids:
node = ET2.Element("property", {eid: PROV_DATA})
elem.append(node)
container_id_set.add(e)
tree.write(output_pre, encoding='unicode')
output_pre.write("\n</perk-users")
output_pre.close()
Thanks!