Split a large xml file using iterparse

Question

    <Database>
        <BlogPost>
            <Date>MM/DD/YY</Date>
            <Author>Last Name, Name</Author>
            <Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content>
        </BlogPost>

        <BlogPost>
            <Date>MM/DD/YY</Date>
            <Author>Last Name, Name</Author>
            <Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content>
        </BlogPost>

    [...]

    <BlogPost>
        <Date>MM/DD/YY</Date>
        <Author>Last Name, Name</Author>
        <Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content>
    </BlogPost>
</Database>

The file text.xml is huge over 15gb and i want to split it into smaller files starting from the tag to

Here is my attempt but this is taking a long time over 5 mins and no results any idea if i am doing something fundamentally wrong here

from lxml import etree
def fast_iter(context, func):
# http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
# Author: Liza Daly
for event, elem in context:
    func(elem)
    elem.clear()
    while elem.getprevious() is not None:
        del elem.getparent()[0]
del context

def process_element(elem):
print (etree.tostring(elem))


xmlFile = r'D:\Test\Test\text.xml'
context = etree.iterparse( xmlFile, tag='BlogPost' )
fast_iter(context,process_element)

I see my ipython program consumes more than 2Gb memory and then finally dies saying my XML file has a invalid line in the end. makes me wonder even if my xml file has a extra line shouldnt the file be parsed incrementally ?

related: [Python running out of memory parsing XML using cElementTree.iterparse](http://stackoverflow.com/q/7697710/4279) — jfs, Feb 25 '14 at 05:10
related: [Efficient way of XML parsing in ElementTree(1.3.0) Python](http://stackoverflow.com/q/7544881/4279) — jfs, Feb 25 '14 at 05:13
[Using python ElementTree's itertree function and writing modified tree to output file](http://stackoverflow.com/q/15399904/4279) — jfs, Feb 25 '14 at 05:15

Split a large xml file using iterparse

0 Answers0