I have file called Books.xml The Books.xml is huge 2Gb with structure similar to this
<Books>
<Book>
<Detail ID="67">
<BookName>Code Complete 2</BookName>
<Author>Steve McConnell</Author>
<Pages>960</Pages>
<ISBN>0735619670</ISBN>
<BookName>Application Architecture Guide 2</BookName>
<Author>Microsoft Team</Author>
<Pages>496</Pages>
<ISBN>073562710X</ISBN>
</Detail>
</Book>
<Book>
<Detail ID="87">
<BookName>Rocking Python</BookName>
<Author>Guido Rossum</Author>
<Pages>960</Pages>
<ISBN>0735619690</ISBN>
<BookName>Python Rocks</BookName>
<Author>Microsoft Team</Author>
<Pages>496</Pages>
<ISBN>073562710X</ISBN>
</Detail>
</Book>
</Books>
I have tried to split it on the Book tag like this
import xml.etree.cElementTree as etree
filename = r'D:\test\Books.xml'
context = iter(etree.iterparse(filename, events=('start', 'end')))
_, root = next(context)
for event, elem in context:
if event == 'start' and elem.tag == 'Book':
print(etree.dump(elem))
root.clear()
I get the result like this
<Book>
<Detail ID="67">
<BookName>Code Complete 2</BookName>
<Author>Steve McConnell</Author>
<Pages>960</Pages>
<ISBN>0735619670</ISBN>
<BookName>Application Architecture Guide 2</BookName>
<Author>Microsoft Team</Author>
<Pages>496</Pages>
<ISBN>073562710X</ISBN>
</Detail>
</Book>
None
<Book>
<Detail ID="87">
<BookName>Rocking Python</BookName>
<Author>Guido Rossum</Author>
<Pages>960</Pages>
<ISBN>0735619690</ISBN>
<BookName>Python Rocks</BookName>
<Author>Microsoft Team</Author>
<Pages>496</Pages>
<ISBN>073562710X</ISBN>
</Detail>
</Book>
None
- How do i get rid of the None
- I would like to store the fragments broken up on book into some sort of queue and then have another program dequeue it.