22

I need to check whether a certain tag exists in an xml file.

For example, I want to see if the tag exists in this snippet:

 <main>
       <elem1/>
       <elem2>Hi</elem2>
       <elem3/>
       ...
 </main>

Currently, I am using an ugly hack with error checking, like this:

try:
   if root.elem1.tag:
      foo = elem1
except AttributeError:
   foo = "error finding elem1"

I also want to customize the string if it is unable to find the node (i.e. "unable to find -tagname-").

I have to check a long list of variables, and I don't want to repeat the code 100 times.

Any suggestions?

Edit:

Here is a snip of the actual xml file:

<main>
 <asset name="Virtual Dvaered Unpresence">
  <virtual/>
  <presence>
   <faction>Dvaered</faction>
   <value>-1000.000000</value>
   <range>0</range>
  </presence>
 </asset>
 <asset name="Virtual Empire Small">
  <virtual/>
  <presence>
   <faction>Empire</faction>
   <value>100.000000</value>
   <range>2</range>
  </presence>
 </asset>
</main>

I want to check whether the tag exists, and, if so, to get the contents.

Edit edit: Ok, I am going to combine two of the answers, but I can only vote for one. Sorry.

Edit 3: Related question about XPath here: Python lxml (objectify): Xpath troubles

Community
  • 1
  • 1
Biosci3c
  • 772
  • 5
  • 15
  • 35

4 Answers4

38

hasattr() works for this:

if hasattr(root, 'elem1'):
    foo = root.elem1
Shadow The GPT Wizard
  • 66,030
  • 26
  • 140
  • 208
Sleep_Walker
  • 381
  • 3
  • 2
  • 7
    This is the answer I like. It's still ugly, but that's Python's fault, not the poster's. I just want to check for child existance, not kick off a full-strength xpath processor. – odigity May 29 '13 at 20:03
  • 4
    Note that internally hasattr works by calling getattr and catching exceptions, so it's just as ugly inside as out (at least it was last time I checked) :) – Ezekiel Kruglick Feb 24 '15 at 06:21
8

Edit: updated answer for sample file.

I'm assuming you want to search each asset for certain tags. If so, the following worked for me:

import lxml.objectify

# Parse the file.
tree = lxml.objectify.parse('sample.xml')
root = tree.getroot()

# Which elements to find.
to_find = set(['presence/faction', 'presence/value', 'fake'])

# Go through each asset in the document.
for asset in root.findall('asset'):
    # Check for each element. 
    for name in to_find:
        node = asset.find(name)
        if node is not None:
            print 'Found %s, its value is %s' % (name, node)
        else:
            print 'Unable to find %s' % name

The output was:

Found presence/value, its value is -1000.0
Found presence/faction, its value is Dvaered
Unable to find fake
Found presence/value, its value is 100.0
Found presence/faction, its value is Empire
Unable to find fake
Blair
  • 15,356
  • 7
  • 46
  • 56
  • This looks like it will work great. I'll try it when I get the chance. Just to clarify, are you using set() with a list as an argument? – Biosci3c Mar 22 '11 at 05:41
  • Yes. The constructor takes an iterable to give the initial entries in the set. See [the docs](http://docs.python.org/library/stdtypes.html#set) for details. – Blair Mar 22 '11 at 23:01
  • Ok, one issue. How can I make this assign values to particular variables (i.e. var_fac = presence/faction, var_value = presence/value? – Biosci3c Mar 27 '11 at 22:24
  • 1
    I'd use a dictionary. `if node is not None: values[name] = node` etc. – Blair Mar 27 '11 at 23:59
  • Oh, I get it, store the values in a dict. I like the idea. – Biosci3c Mar 28 '11 at 02:44
7

Assume you want to get elem2's value, you can use xpath to find it.

tree = etree.parse(StringIO(htmlString), etree.HTMLParser()).getroot()
youWantValue = tree.xpath('/main/elem2')[0].text
Daniel
  • 324
  • 3
  • 10
  • What happens if the node does not exist? Does it give an error, or simply a blank value? – Biosci3c Mar 22 '11 at 05:44
  • @Biosci3c that specific example gives an error, due to the `[0]` trying to dereference the first value returned by the `xpath` call. If you checked whether the list was empty before dereferencing, on the other hand, you'd have a test with no error. With that correction made, btw, I find this to be the best-practices answer among those given. – Charles Duffy Mar 22 '11 at 11:45
  • Ok, I like the XPATH suggestion, so I will use that too. BTW, I think you are missing a closing parenthesis at the end of the top line. – Biosci3c Mar 28 '11 at 02:39
  • Ah, so I can check whether list has length 0. Good idea. – Biosci3c Mar 28 '11 at 02:47
  • Ok, I tried a version of your method, but I am getting list index out of range. I'll post my updated code above. – Biosci3c Mar 29 '11 at 02:54
2

If your document tends to be relatively short you can iterate over all children of <main> looking for tags matching your set of variable names:

tree = lxml.etree.fromstring(DATA)
NAMES = set(['elem1', 'elem3'])
for node in tree.iterchildren():
    if node.tag in NAMES:
        print 'found', node.tag

Or you can search for each variable name one at a time:

for tag in ('elem1', 'elem3'):
    if tree.find(tag) is not None:
        print 'found', tag
samplebias
  • 37,113
  • 6
  • 107
  • 103