Python extract nodes containing tag using ElementTree

Question

I need to extract from an XML few nodes IF one of them contains keyword. Finally I got to point where I'll have the keywords printed if found. Now is the tricky part (at least for me ;-)). I'll explain it below in more details. XML:

<?xml version="1.0"?>
<ItemSearchResponse xmlns="http://url">
  <Items>
    <Item>
      <ItemAttributes>
        <ListPrice>
          <Amount>2260</Amount>
        </ListPrice>
      </ItemAttributes>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1200</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
    </Item>
  </Items>
</ItemSearchResponse>

My script prints out the Amount value if found and == 1853. What I actually need is: when 1853 found - the script should extract the whole <Offers> to new file. I got script running and got stuck. I have really no clue how to get back from <Amount> and copy the whole <Offers> group.

Script 1:

import xml.etree.ElementTree as ET
import sys

name = str.strip(sys.argv[1])
filename = str.strip(sys.argv[2])

fp = open("sample.xml","r")
element = ET.parse(fp)

for elem in element.iter():
    if elem.tag == '{http://url}Price':
        output = {}
        for elem1 in list(elem):
            if elem1.tag == '{http://url}Amount':
                if elem1.text == name:
                    output['Amount'] = elem1.text
                    print output

And my output:

python sample1.py '1853' x
{'Amount': '1853'}
{'Amount': '1853'}

The 'x'-thing here is no relevant.

How to get back from <Amount> and copy the whole <Offers> group to a new file or just print the thing out. It need to be done with ElementTree.

only ElementTree? because this package http://pythonhosted.org/pyquery/ is funny for doing this kind of think, it is a jquery like system — Philippe T., Sep 05 '13 at 10:06

score 3 · Accepted Answer · edited May 23 '17 at 12:05

3

What about this:

import xml.etree.ElementTree as ET
import sys

name = str.strip(sys.argv[1])
filename = str.strip(sys.argv[2])

fp = open("sample.xml","r")
tree = ET.parse(fp)
root = tree.getroot()

for offers in root.findall('.//{http://url}Offers'):
    value_found = False
    for amount in offers.findall('.//{http://url}Amount'):
        if amount.text == name:
            value_found = True
            break
    if value_found:
        print ET.tostring(offers)

Prints

<url:Offers xmlns:url="http://url">
    <url:Offer>
      <url:OfferListing>
        <url:Price>
          <url:Amount>1853</url:Amount>
        </url:Price>
      </url:OfferListing>
    </url:Offer>
  </url:Offers>

<url:Offers xmlns:url="http://url">
    <url:Offer>
      <url:OfferListing>
        <url:Price>
          <url:Amount>1853</url:Amount>
        </url:Price>
      </url:OfferListing>
    </url:Offer>
  </url:Offers>

To write to files, you can do something like: (borrowed from this answer)

for i, offers in enumerate(root.findall('.//{http://url}Offers'), start=1):
    value_found = False
    for amount in offers.findall('.//{http://url}Amount'):
        if amount.text == name:
            value_found = True
            break
    if value_found:
        tree = ET.ElementTree(offers)
        tree.write("offers%d.xml" % i,
           xml_declaration=True, encoding='utf-8',
           method="xml", default_namespace='http://url')

which writes files like:

<?xml version='1.0' encoding='utf-8'?>
<Offers xmlns="http://url">
    <Offer>
      <OfferListing>
        <Price>
          <Amount>1853</Amount>
        </Price>
      </OfferListing>
    </Offer>
  </Offers>

edited May 23 '17 at 12:05

Community

1
1

answered Sep 05 '13 at 10:28

paul trmbrth

20,518
4
53
66

It's not like that. I'm looking for the Offers with 1853 in Amount. If found, I need to extract the whole with childnodes to new file. So, when 1853 given, two groups should be extracted - 18531853. I thougt also about the xml.dom, but I'm not sure if I think in the right way here – jakkolwiek Sep 05 '13 at 10:31
My bad. I removed the 2nd break and called ET.tostring(offers) – paul trmbrth Sep 05 '13 at 10:43
Yep, this is just perfect! I see I need still learn about enumerates to fully understand the thing, but - thank you a lot! This is a great help! – jakkolwiek Sep 05 '13 at 11:38
@jakkolwiek, enumerate() is simply a very neat helper to count in loops. My greatest discovery lately was the "start" parameter ;) – paul trmbrth Sep 05 '13 at 11:42
Actually I still don't get one thing... Let's say in my source xml, there are like 300 tags with value = 1853. It's all printed nicely in terminal but in file there is only last tag written. I've tried also to stream the strings to file but still can't get it right. And still - in terminal is everything fine, but in file ends up only one last record. – jakkolwiek Sep 05 '13 at 14:24
Can you post a more complete XML with the case you mention? – paul trmbrth Sep 05 '13 at 14:35
It's still the same case, even using the code you posted. In terminal it's perfect, but in file is written only last found tag and just can't understand why. Especially when it's printed correct. So.. the XML is actually also the same, just with few more groups with various s. In the XML can be like 5 Amounts = 1853 and 6 others. Script prints on screen correct found the 5 with 1853, but in file ends up with only last one. – jakkolwiek Sep 05 '13 at 14:53

Python extract nodes containing tag using ElementTree

1 Answers1