XML parsing specific values - Python

Question

I've been attempting to parse a list of xml files. I'd like to print specific values such as the userName value.

<?xml version="1.0" encoding="utf-8"?>
<Drives clsid="{8FDDCC1A-0C3C-43cd-A6B4-71A6DF20DA8C}" 
        disabled="1">
  <Drive clsid="{935D1B74-9CB8-4e3c-9914-7DD559B7A417}" 
         name="S:" 
         status="S:" 
         image="2" 
         changed="2007-07-06 20:57:37" 
         uid="{4DA4A7E3-F1D8-4FB1-874F-D2F7D16F7065}">
    <Properties action="U" 
                thisDrive="NOCHANGE" 
                allDrives="NOCHANGE" 
                userName="" 
                cpassword="" 
                path="\\scratch" 
                label="SCRATCH" 
                persistent="1" 
                useLetter="1" 
                letter="S"/>
  </Drive>
</Drives>

My script is working fine collecting a list of xml files etc. However the below function is to print the relevant values. I'm trying to achieve this as suggested in this post. However I'm clearly doing something incorrectly as I'm getting errors suggesting that elm object has no attribute text. Any help would be appreciated.

Current Code

from lxml import etree as ET

def read_files(files):
    for fi in files:
        doc = ET.parse(fi)
        elm = doc.find('userName')
        print elm.text

score 1 · Answer 1 · answered Sep 09 '14 at 19:39

1

userName is an attribute, not an element. Attributes don't have text nodes attached to them at all.

for el in doc.xpath('//*[@userName]'):
  print el.attrib['userName']

answered Sep 09 '14 at 19:39

Charles Duffy

280,126
43
390
441

could you please explain the `//*[@userName]` value? Id like to understand how to add multiple attributes. – iNoob Sep 09 '14 at 19:49
@iNoob, you should read the `etree` docs for how to use an XPath-like specification to find tags with specific attributes: https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-xpath – Dan Lenski Sep 09 '14 at 19:54
@DanLenski, ...oh, that's a fair objection -- I was using real XPath here; in practice, I use `lxml.etree`, not the standard-library ElementTree. – Charles Duffy Sep 09 '14 at 19:56
@DanLenski, ...actually, rereading the question, the OP is using lxml.etree, so it's actually a fair call. :) – Charles Duffy Sep 09 '14 at 19:57
Oh, I'm not objecting! I prefer `lxml.etree` myself. I simply pointed the OP to the stdlib `etree` docs because they give a simple, concise summary of the XPath strings, and the `lxml` version is basically a superset of that behavior. – Dan Lenski Sep 09 '14 at 19:58
2

@iNoob, `@userName` means "having an attribute called userName". `//` does a recursive search. `*` matches an element with any name. That said, if you know that the element with the properties you want will always be called Properties, then you don't need to search off of which attributes it has. – Charles Duffy Sep 09 '14 at 19:58

score 1 · Accepted Answer · answered Sep 09 '14 at 19:39

1

doc.find looks for a tag with the given name. You are looking for an attribute with the given name.

elm.text is giving you an error because doc.find doesn't find any tags, so it returns None, which has no text property.

Read the lxml.etree docs some more, and then try something like this:

doc = ET.parse(fi)
root = doc.getroot()
prop = root.find(".//Properties") # finds the first <Properties> tag anywhere
elm = prop.attrib['userName']

answered Sep 09 '14 at 19:39

Dan Lenski

76,929
13
76
124

2

Eh? It's not the root that the attribute hangs off of. – Charles Duffy Sep 09 '14 at 19:40
1

You're quite right, I fixed my answer to look for the attribute of the `Properties` tag instead. – Dan Lenski Sep 09 '14 at 19:42

score 0 · Answer 3 · answered Sep 09 '14 at 19:42

You can try to take the element using the tag name and then try to take its attribute (userName is an attribute for Properties):

from lxml import etree as ET

def read_files(files):
    for fi in files:
        doc = ET.parse(fi)
        props = doc.getElementsByTagName('Properties') 
        elm = props[0].attributes['userName']
        print elm.value

XML parsing specific values - Python

3 Answers3