(A)
Well since you said ANY help...here's my shot-
From my experience, you're going to be much more satisfied prodding around with
obj.__dict__
and seeing how each xml element fits. This way you'll effectively spell check the entire xml file by passing an iteration test (the following)
I took your example data, placed it in .xml file, loaded it up with Python IDE (2.7.xxx). Here's how I crafted what code to use:
import xml.etree.ElementTree as ET
>>> some_tree = ET.parse("/Users/pro/Desktop/tech/test_scripts/test.xml")
>>> for block_number in range(0, len(some_tree._root.getchildren())):
print "block_number: " + str(block_number)
block_number: 0
block_number: 1
block_number: 2
>>> some_tree._root.getchildren()
[<Element 'title' at 0x101a59450>, <Element 'fulltext' at 0x101a59550>, <Element 'figures' at 0x101a59410>]
>>> some_tree._root.__dict__
{'text': '\n', 'attrib': {'pmid': '19243591', 'doi': '10.1186/1472-6963-9-38', 'pmcid': '2653499'}, 'tag': 'article', '_children': [<Element 'title' at 0x101a59450>, <Element 'fulltext' at 0x101a59550>, <Element 'figures' at 0x101a59410>]}
>>> some_tree._root.attrib
{'pmid': '19243591', 'doi': '10.1186/1472-6963-9-38', 'pmcid': '2653499'}
>>> some_tree._root.attrib['pmid']
'19243591'
>>> to_store = {}
>>> to_store[some_tree._root.attrib['pmid']] = []
>>> some_tree._root.getchildren()
[<Element 'title' at 0x101a59450>, <Element 'fulltext' at 0x101a59550>, <Element 'figures' at 0x101a59410>]
>>> some_tree._root[2]
<Element 'figures' at 0x101a59410>
>>> some_tree._root[2].__dict__
{'text': '\n', 'attrib': {}, 'tag': 'figures', 'tail': '\n', '_children': [<Element 'figure' at 0x101a595d0>, <Element 'figure' at 0x101a59650>]}
>>> some_tree._root[2].getchildren()
[<Element 'figure' at 0x101a595d0>, <Element 'figure' at 0x101a59650>]
>>> for r in range(0, len(some_tree._root[2].getchildren())):
print some_tree._root[2].getchildren()[r]
<Element 'figure' at 0x101a595d0>
<Element 'figure' at 0x101a59650>
>>> some_tree._root[2].getchildren()[1].__dict__
{'attrib': {'iri': '1472-6963-9-38-1'}, 'tag': 'figure', 'tail': '\n', '_children': [<Element 'caption' at 0x101a59690>]}
>>> for r in range(0, len(some_tree._root[2].getchildren())):
to_store[to_store.keys()[0]].append(some_tree._root[2].getchildren()[r].attrib['iri'])
>>> to_store
{'19243591': ['1472-6963-9-38-2', '1472-6963-9-38-1']}
>>>
Note that to_store is arbitrary and mere convenience for however you want to store those x,y pieces of data.
B)
I really liked outputting to my own sqlite flat file db. I did it for translating the entire Bible to use at runtime in an iOS app I released. Here's some example code for the sql:
import sqlite3
bible_books = ["genesis", "exodus", "leviticus", "numbers", "deuteronomy",
"joshua", "judges", "ruth", "1 samuel", "2 samuel", "1 kings",
"2 kings", "1 chronicles", "2 chronicles", "ezra", "nehemiah",
"esther", "job", "psalms", "proverbs", "ecclesiastes",
"song of solomon", "isaiah", "jeremiah", "lamentations",
"ezekiel", "daniel", "hosea", "joel", "amos", "obadiah",
"jonah", "micah", "nahum", "habakkuk", "zephaniah", "haggai",
"zechariah", "malachi", "matthew", "mark", "luke", "john",
"acts", "romans", "1 corinthians", "2 corinthians",
"galatians", "ephesians", "philippians", "colossians",
"1 thessalonians", "2 thessalonians", "1 timothy",
"2 timothy", "titus", "philemon", "hebrews", "james",
"1 peter", "2 peter", "1 john", "2 john", "3 john",
"jude", "revelation"]
chapter_counts = {bible_books[0]:50, bible_books[1]:40, bible_books[2]:27,
bible_books[3]:36, bible_books[4]:34, bible_books[5]:24,
bible_books[6]:21, bible_books[7]:4, bible_books[8]:31,
bible_books[9]:24, bible_books[10]:22, bible_books[11]:25,
bible_books[12]:29, bible_books[13]:36, bible_books[14]:10,
bible_books[15]:13, bible_books[16]:10, bible_books[17]:42,
bible_books[18]:150, bible_books[19]:31, bible_books[20]:12,
bible_books[21]:8, bible_books[22]:66, bible_books[23]:52,
bible_books[24]:5, bible_books[25]:48, bible_books[26]:12,
bible_books[27]:14, bible_books[28]:3, bible_books[29]:9,
bible_books[30]:1, bible_books[31]:4, bible_books[32]:7,
bible_books[33]:3, bible_books[34]:3,
bible_books[35]:3, bible_books[36]:2, bible_books[37]:14,
bible_books[38]:4, bible_books[39]:28, bible_books[40]:16,
bible_books[41]:24, bible_books[42]:21, bible_books[43]:28,
bible_books[44]:16, bible_books[45]:16, bible_books[46]:13,
bible_books[47]:6, bible_books[48]:6, bible_books[49]:4,
bible_books[50]:4, bible_books[51]:5, bible_books[52]:3,
bible_books[53]:6, bible_books[54]:4, bible_books[55]:3,
bible_books[56]:1, bible_books[57]:13, bible_books[58]:5,
bible_books[59]:5, bible_books[60]:3, bible_books[61]:5,
bible_books[62]:1, bible_books[63]:1, bible_books[64]:1,
bible_books[65]:22}
conn = sqlite3.connect("bible_web.sqlite3")
c = conn.cursor()
for i_book in bible_books:
book_name = "b_" + i_book.lower().replace(" ", "_")
for i_chapter in range(1, chapter_counts[i_book]+1):
c.execute("create table " + book_name + "_" + str(i_chapter) + " (verse real primary key, value text)")
for i_book in bible_books:
book_name = "b_" + i_book.lower().replace(" ", "_")
for i_chapter in range(1, chapter_counts[i_book]+1):
#c.execute("SELECT Count(*) FROM " + book_name + "_" + str(i_chapter))
#i_rows = int(c.fetchall())
#for verse_number in range(1, i_rows+1):
c.execute("update " + book_name + "_" + str(i_chapter) + " set value=trim(value)")
conn.commit()
c.close()
conn.close()
Just some ideas. Hope that helps.