python elementTree get attribute that ends with

Question

giving the following xml as input to elementTree (using python 2.7):

 <body>
<div region="imageRegion" xml:id="img_SUB6756004155_0" ttm:role="caption" smpte:backgroundImage="#SUB6756004155_0">
</body>

i get this attributes:

so i need to find the attribute the ends with 'backgroundImage' or 'id'

normally i would do it like this:

 div.get('region')

but here i only know part of the attribute name,

is it possible to use regex?

Read [parsing-xml-with-namespace-in-python-via-elementtree](https://stackoverflow.com/questions/14853243/parsing-xml-with-namespace-in-python-via-elementtree) — stovfl, Jan 16 '19 at 17:32
@mzjn the div has an end tag '>' and beside, that's not the point — Ortal Turgeman, Jan 17 '19 at 09:49
@stovfl i read that, but if i understand correctly i need to know the namespace and define it before, what if i don't know it (or it could change) can i find an attribute with only part of the string? — Ortal Turgeman, Jan 17 '19 at 10:02
@OrtalTurgeman: *"if i don't know it"*: Read the last sentence in the Answer of [parsing-xml-with-namespace-in-python-via-elementtree](https://stackoverflow.com/a/14853417/7414759) — stovfl, Jan 17 '19 at 10:24
if you mean to use lxml, i can't use it, it doesn't exist by default (probably needs installation) and i need something that will work on every machine with python — Ortal Turgeman, Jan 17 '19 at 10:32

score 1 · Accepted Answer · answered Jan 17 '19 at 19:12

Another option would be to iterate over the attributes and return the value of the attribute with a local-name that ends in backgroundImage.

Example...

from xml.etree import ElementTree as ET

XML = '''
<body xmlns:ttm="http://www.w3.org/ns/ttml#metadata" 
      xmlns:smpte="http://smpte-ra.org/schemas/2052-1/2013/smpte-tt">
  <div region="imageRegion" xml:id="img_SUB6756004155_0" 
       ttm:role="caption" smpte:backgroundImage="#SUB6756004155_0"></div>
</body>'''

root = ET.fromstring(XML)
div = root.find("div")
val = next((v for k, v in div.attrib.items() if k.endswith('backgroundImage')), None)

if val:
    print(f"Value: {val}")

Outputs...

Value: #SUB6756004155_0

This can be fragile though. It only returns the first attribute found.

If that's a problem, maybe use a list instead:

val = [v for k, v in div.attrib.items() if k.endswith('backgroundImage')]

It would also incorrectly return an attribute that ends with "backgroundImage" (like "invalid_backgroundImage").

If that's a problem, maybe use regex instead:

val = next((v for k, v in div.attrib.items() if re.match(r".*}backgroundImage$", "}" + k)), None)

If you're ever able to switch to lxml, the testing of the local-name can be done in xpath...

val = div.xpath("@*[local-name()='backgroundImage']")

score 0 · Answer 2 · answered Jan 17 '19 at 16:48

The snippet below demonstrates how you can get the value of the smpte:backgroundImage attribute from a well-formed XML document (the input document in the question is not well-formed).

smpte: means that the attribute is bound to a namespace, which is http://smpte-ra.org/schemas/2052-1/2013/smpte-tt, judging by the screenshot. Note that both the ttm and smpte prefixes must be declared in the XML document (xmlns:ttm="..." and xmlns:smpte="...").

In the get() call, the attribute name must be given in "Clark notation": {http://smpte-ra.org/schemas/2052-1/2013/smpte-tt}backgroundImage.

from xml.etree import ElementTree as ET

XML = '''
<body xmlns:ttm="http://www.w3.org/ns/ttml#metadata" 
      xmlns:smpte="http://smpte-ra.org/schemas/2052-1/2013/smpte-tt">
  <div region="imageRegion" xml:id="img_SUB6756004155_0" 
       ttm:role="caption" smpte:backgroundImage="#SUB6756004155_0"></div>
</body>'''

root = ET.fromstring(XML)
div = root.find("div")
print(div.get("{http://smpte-ra.org/schemas/2052-1/2013/smpte-tt}backgroundImage"))

Output:

#SUB6756004155_0

but this `http://smpte-ra.org/schemas/2052-1/2013/smpte-tt` could change, it's not hard coded — Ortal Turgeman, Jan 20 '19 at 09:20

score 0 · Answer 3 · answered Jan 20 '19 at 12:54

0

this solution also worked for me:

r = re.compile(r'img_.+')
image_id = filter(r.match, div.attrib.values())
id = image_id[0].split('_', 1)[1]

id ='SUB6756004155_0'

answered Jan 20 '19 at 12:54

Ortal Turgeman

143
4
14

python elementTree get attribute that ends with

3 Answers3