Switching from amara to lxml in Python

Question

I am trying to accomplish with lxml library something like this: http://www.xml.com/pub/a/2005/01/19/amara.html

from amara import binderytools

container = binderytools.bind_file('labels.xml')
for l in container.labels.label:
    print l.name, 'of', l.address.city

but I have had the hardest time to get my feel wet! What I want to do is: descend to the root node named 'X', then descend to its second child named 'Y', then grab all of its children 'named Z', then of those keep only the children than have an attribute 'name' set to 'bacon', then for each remaining node look at all of its children named 'W', and keep only a subset based on some filter, which looks at W's only children named A, B, and C. Then I need to process them with the following (non-optimized) pseudo-code:

result = []
X = root(doc(parse(xml_file_name)))
Y = X[1] # Second child
Zs = Y.children()
for Z in Zs:
    if Z.name != 'bacon': continue # skip
    Ws = Z.children()
    record = []
    assert(len(Ws) == 9)
    W0 = Ws[0]
    assert(W0.A == '42')
    record.append(str(W0.A) + " " + W0.B + " " + W0.C))
    ...
    W1 = Ws[1]
    assert(W1.A == '256')
    ...
    result.append(record)

This is sort of what I am trying to accomplish. Before I try to make this code cleaner, I would like to make it work.

Please help, as I am lost in this API. Let me know if you have questions.

unutbu · Accepted Answer · 2010-11-23T02:06:19.893

import lxml.etree as le
import io

content='''\
<foo><X><Y>skip this</Y><Y><Z name="apple"><W>not here</W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
</Y></X></foo>
'''
doc=le.parse(io.BytesIO(content))
# print(le.tostring(doc, pretty_print=True))
result=[]
Zs=doc.xpath('//X/Y[2]/Z[@name="bacon"]')
for Z in Zs:
    Ws=Z.xpath('W')
    record=[]
    assert(len(Ws)==2)  #<--- Change to 9        
    abc=Ws[0].xpath('descendant::text()')
    # print(abc)
    # ['42', 'b', 'c']
    assert(abc[0] == '42')
    record.append(' '.join(abc))
    abc=Ws[1].xpath('descendant::text()')    
    assert(abc[0] == '256')
    result.append(record)
print(result)
# [['42 b c'], ['42 b c']]

This might be a way to tighten-up the inner loop, though I'm only guessing what records you wish to keep:

for Z in Zs:
    Ws=Z.xpath('W')
    assert(len(Ws)==2)  #<--- Change to 9
    a_vals=('42','256')
    for W,a_val in zip(Ws,a_vals):
        abc=W.xpath('descendant::text()')
        assert(abc[0] == a_val)
        result.append([' '.join(abc)])
print(result)
# [['42 b c'], ['256 b c'], ['42 b c'], ['256 b c']]

+1 I started writing an answer when I realized my code looked about 90% similar to yours, and in the 10% that was different, yours looked better. — snapshoe, Nov 23 '10 at 03:12
I would like to ask a follow-up question. If I am looking at something like `abc...`, then how can I look at `column.id` and `column.name`? I have already seen one example: `Zs=doc.xpath('//X/Y[2]/Z[@name="bacon"]')`, but what I need is the ability to look at individual attributes at a node that I have already "selected". — Hamish Grubijan, Nov 23 '10 at 15:36
@Hamish Grubijan: Each `ET._Element` node `column` has an Python attribute `.attrib` which contains all the XML node atributes in a `dict`. So you can look at the value of `id` with `column.attrib['id']`. — unutbu, Nov 23 '10 at 16:01

Switching from amara to lxml in Python

1 Answers1