2

My XML file:

<xml 
xmlns="http://www.myweb.org/2003/instance"
xmlns:link="http://www.myweb.org/2003/linkbase"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:iso4217="http://www.myweb.org/2003/iso4217"
xmlns:utr="http://www.myweb.org/2009/utr">

<link:schemaRef xlink:type="simple" xlink:href="http://www.myweb.com/form/2020-01-01/test.xsd"></link:schemaRef>

I want to get the URL: http://www.myweb.com/folder/form/1/2020-01-01/test.xsd from the <link:schemaRef> tag.

My below python code finds the <link:schemaRef> tag. But I am unable to retrieve the URL.

from lxml import etree
with open(filepath,'rb') as f:
     file = f.read()    
root = etree.XML(file)
print(root.nsmap["link"]) #http://www.myweb.org/2003/linkbase
print(root.find(".//{"+root.nsmap["link"]+"}"+"schemaRef")) 
user2961127
  • 963
  • 2
  • 17
  • 29
  • Does this answer your question? [Get attribute names and values from ElementTree](https://stackoverflow.com/questions/14323335/get-attribute-names-and-values-from-elementtree) – Joe Apr 12 '20 at 05:57

2 Answers2

0

Use:

>>> child = root.getchildren()[0]
>>> child.attrib
{'{http://www.w3.org/1999/xlink}type': 'simple', '{http://www.w3.org/1999/xlink}href': 'http://www.myweb.com/form/2020-01-01/test.xsd'}
>>> url = child.attrib['{http://www.w3.org/1999/xlink}href']

However, I believe the challenge is would you know which is the right key (i.e. {http://www.w3.org/1999/xlink}href) to be used. If this is the issue, then we just need:

>>> print(root.nsmap['xlink'])   # Notice that the requested url is a href to the xlink
'http://www.w3.org/1999/xlink'
>>> key_url = "{"+key_prefix+"}href"
>>> print(child.attrib[key_url])
'http://www.myweb.com/form/2020-01-01/test.xsd'
Ji Wei
  • 840
  • 9
  • 19
0

Try it this way and see if it works:

for i in root.xpath('//*/node()'):
if isinstance(i,lxml.etree._Element):
     print(i.values()[1])

Output:

http://www.myweb.com/form/2020-01-01/test.xsd
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45