<Database>
<BlogPost>
<Date>MM/DD/YY</Date>
<Author>Last Name, Name</Author>
<Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content>
</BlogPost>
<BlogPost>
<Date>MM/DD/YY</Date>
<Author>Last Name, Name</Author>
<Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content>
</BlogPost>
[...]
<BlogPost>
<Date>MM/DD/YY</Date>
<Author>Last Name, Name</Author>
<Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content>
</BlogPost>
</Database>
The file text.xml is huge over 15gb and i want to split it into smaller files starting from the tag to
Here is my attempt but this is taking a long time over 5 mins and no results any idea if i am doing something fundamentally wrong here
from lxml import etree
def fast_iter(context, func):
# http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
# Author: Liza Daly
for event, elem in context:
func(elem)
elem.clear()
while elem.getprevious() is not None:
del elem.getparent()[0]
del context
def process_element(elem):
print (etree.tostring(elem))
xmlFile = r'D:\Test\Test\text.xml'
context = etree.iterparse( xmlFile, tag='BlogPost' )
fast_iter(context,process_element)
I see my ipython program consumes more than 2Gb memory and then finally dies saying my XML file has a invalid line in the end. makes me wonder even if my xml file has a extra line shouldnt the file be parsed incrementally ?