0

I have an XSD file of the following format:

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:type name="type1">
        <xsd:example>
          <xsd:description>This is the description of said type1 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type2">
        <xsd:example>
          <xsd:description>This is the description of said type2 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type3">
        <xsd:example>
          <xsd:description>This is the description of said type3 tag</xsd:description>
        </xsd:example>
    </xsd:type>
</xsd:schema>

and the following XML file:

<theRoot>
    <type1>hi from type1</type1>
    <theChild>
        <type2>hi from type2</type2>
        <type3>hi from type3</type3>
    </theChild>
</theRoot>

I'd like to retrieve the value in between the xsd:description tag given that it is the child of the xsd:type tag with the name="type1" attribute. In other words, I'd like to retrieve "This is the description of said type1 tag".

I have tried to do this with lxml in the following way using Python:

from lxml import etree
XSDDoc = etree.parse(xsdFile)
root = XSDDoc.getroot()
result = root.findall(".//xsd:type/xsd:example/xsd:description[@name='type1']", root.nsmap)

I've used the same example and solution mentioned here. However, what I have done just returns empty results and I'm not able to retrieve the correct result.

For reference, my Python version is: Python 2.7.10

EDIT: When I use an example provided in the answer by retrieving the XML structure from a string, the result is as expected. However, when I try to retrieve from a file, I get empty lists returned (or None).

I am doing the following:

  • Retrieving the XML from a file
  • Including a variable to denote the name attribute (as it is dynamic)

The code loops over each node in a separate XML file, then checks in the XSD file to get each of the attributes as a result:

XMLDoc = etree.parse(open(xmlFile))

for Node in XMLDoc.xpath('//*'):
    nameVariable = os.path.basename(XMLDoc.getpath(Node))
    root = XSDDoc.getroot()
    description = XSDDoc.find(".//xsd:type[@name='{0}']/xsd:example/xsd:description".format(nameVariable), root.nsmap)

If I try to print out the result.text, I get:

AttributeError: 'NoneType' object has no attribute 'text'

Adam
  • 2,384
  • 7
  • 29
  • 66
  • What exactly have you tried? In the code in the question, you don't attempt to get the `xsd:description` element (which is the grandchild of `xsd:type`). – mzjn Nov 19 '19 at 12:19
  • @mzjn sorry, as I've had to remove some sensitive information, I've left out the remaining path following xsd:type. I have edited the question to reflect my exact code. – Adam Nov 19 '19 at 12:25
  • That is not really the "exact" code (what is `nameVariable`?) Please provide a [mcve]. – mzjn Nov 19 '19 at 14:52
  • I have edited my question. nameVariable is simply a string. – Adam Nov 19 '19 at 14:55
  • Sorry to nag about this, but when I ask for a [mcve], I mean **complete but minimal** code (and XML) that I can copy, paste and run without changing anything. – mzjn Nov 19 '19 at 14:59
  • You're not nagging at all! Thanks for your input. I've included more bits from my code which should hopefully answer your concern. Let me know if not. – Adam Nov 19 '19 at 15:17
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202655/discussion-between-adam-and-mzjn). – Adam Nov 19 '19 at 15:34

1 Answers1

1

The predicate ([@name='type1']) must be applied in the right place. The name attribute is on the xsd:type element. This should work:

result = root.findall(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

# result is a list
for r in result:
    print(r.text)

In case you only want a single node, you can use find instead of findall. Complete example:

from lxml import etree

xsdFile = """
<root xmlns:xsd='http://whatever.com'>
 <xsd:type name="type1">
     <xsd:example>
       <xsd:description>This is the description of said type1 tag</xsd:description>
     </xsd:example>
 </xsd:type>
</root>"""

root = etree.fromstring(xsdFile)
result = root.find(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

print(result.text)
mzjn
  • 48,958
  • 13
  • 128
  • 248
  • Thank you for your answer. However, that piece of code returns an empty list, rather than anything containing the value within the tag. – Adam Nov 19 '19 at 13:25
  • Also, I believe the code would return a list object. How can I extract the value attribute from that list object? – Adam Nov 19 '19 at 13:43
  • Thank you for your help again. I have edited my question based off your answer. Please have a look when you can. – Adam Nov 19 '19 at 14:41