-1

I got this xml file that houses the training data but it has no root and xml.etree.ElementTree is not doing well with XML files that don't have root.

<root>
<?xml version = '1.0' encoding = 'ISO-8859-1'?>
<?xml-stylesheet type = 'text/xsl' href = 'image_metadata_stylesheet.xsl'?>
<labels>
<label nr="0" desc="background" colorred="0" colorgreen="0" colorblue="0"/>
<label nr="1" desc="generalface" colorred="128" colorgreen="128" colorblue="128"/>
<label nr="2" desc="left eye" colorred="0" colorgreen="255" colorblue="0"/>
<label nr="3" desc="right eye" colorred="0" colorgreen="128" colorblue="0"/>
<label nr="4" desc="nose" colorred="0" colorgreen="0" colorblue="255"/>
<label nr="5" desc="left ear" colorred="0" colorgreen="255" colorblue="255"/>
<label nr="6" desc="right ear" colorred="0" colorgreen="64" colorblue="64"/>
<label nr="7" desc="lips" colorred="255" colorgreen="0" colorblue="0"/>
<label nr="8" desc="left eyebrow" colorred="255" colorgreen="0" colorblue="255"/>
<label nr="9" desc="right eyebrow" colorred="128" colorgreen="0" colorblue="128"/>
<label nr="10" desc="hair" colorred="255" colorgreen="255" colorblue="0"/>
<label nr="11" desc="teeth" colorred="255" colorgreen="255" colorblue="255"/>
<label nr="12" desc="specs" colorred="0" colorgreen="128" colorblue="128"/>
<label nr="13" desc="beard" colorred="255" colorgreen="192" colorblue="192"/>
</labels>
<trainingdata>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0000.png"/>
<labelimg name="labels\female01\headrende0000.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0003.png"/>
<labelimg name="labels\female01\headrende0003.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0006.png"/>
<labelimg name="labels\female01\headrende0006.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0009.png"/>
<labelimg name="labels\female01\headrende0009.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0012.png"/>
<labelimg name="labels\female01\headrende0012.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0015.png"/>
<labelimg name="labels\female01\headrende0015.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0018.png"/>
<labelimg name="labels\female01\headrende0018.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0021.png"/>
<labelimg name="labels\female01\headrende0021.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0024.png"/>
<labelimg name="labels\female01\headrende0024.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0027.png"/>
<labelimg name="labels\female01\headrende0027.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0030.png"/>
<labelimg name="labels\female01\headrende0030.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0033.png"/>
<labelimg name="labels\female01\headrende0033.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0036.png"/>
<labelimg name="labels\female01\headrende0036.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0039.png"/>
<labelimg name="labels\female01\headrende0039.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0042.png"/>
<labelimg name="labels\female01\headrende0042.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0045.png"/>
<labelimg name="labels\female01\headrende0045.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0048.png"/>
<labelimg name="labels\female01\headrende0048.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0051.png"/>
<labelimg name="labels\female01\headrende0051.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0054.png"/>
<labelimg name="labels\female01\headrende0054.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0057.png"/>
<labelimg name="labels\female01\headrende0057.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0060.png"/>
<labelimg name="labels\female01\headrende0060.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0063.png"/>
<labelimg name="labels\female01\headrende0063.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0066.png"/>
<labelimg name="labels\female01\headrende0066.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0069.png"/>
<labelimg name="labels\female01\headrende0069.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0072.png"/>
<labelimg name="labels\female01\headrende0072.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0075.png"/>
<labelimg name="labels\female01\headrende0075.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0078.png"/>
<labelimg name="labels\female01\headrende0078.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0081.png"/>
<labelimg name="labels\female01\headrende0081.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0084.png"/>
<labelimg name="labels\female01\headrende0084.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0087.png"/>
<labelimg name="labels\female01\headrende0087.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0090.png"/>
<labelimg name="labels\female01\headrende0090.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0093.png"/>
<labelimg name="labels\female01\headrende0093.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0096.png"/>
<labelimg name="labels\female01\headrende0096.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0099.png"/>
<labelimg name="labels\female01\headrende0099.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0102.png"/>
<labelimg name="labels\female01\headrende0102.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0105.png"/>
<labelimg name="labels\female01\headrende0105.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0108.png"/>
<labelimg name="labels\female01\headrende0108.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0111.png"/>
<labelimg name="labels\female01\headrende0111.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0114.png"/>
<labelimg name="labels\female01\headrende0114.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0117.png"/>
<labelimg name="labels\female01\headrende0117.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0120.png"/>
<labelimg name="labels\female01\headrende0120.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0123.png"/>
<labelimg name="labels\female01\headrende0123.png"/>
<srcimg name="female01_brownhair_blueeyes_env02\headrende0126.png"/>
<labelimg name="labels\female01\headrende0126.png"/>
...
</trainingdata>

I specifically want to traverse every single entry within <trainingdata></trainingdata> and get access to <srcimg name = ''></> and <labelimg name = ''></>. How can I go about doing this?

Onur-Andros Ozbek
  • 2,998
  • 2
  • 29
  • 78
  • 2
    That's not XML. XML is only allowed to have a single root element. To work-around, see @larsks answer (+1) or the [canonical Q/A for parsing bad "XML"](https://stackoverflow.com/q/44765194/290085), which has several relevant Python recommendations. – kjhughes Feb 22 '22 at 23:17

1 Answers1

2

I think the only solution here is to wrap the data in a root element, and then parse it. E.g:

from lxml import etree

directives = []
body = []
with open("data.xml", "rb") as fd:
    for line in fd:
        if line.startswith(b"<?xml"):
            directives.append(line)
        else:
            body.append(line)

data = b"".join(
  directives + [b"<fakeroot>"] + body + [b"</fakeroot>"]
)
doc = etree.fromstring(data)

for x in doc.xpath("/fakeroot/trainingdata/srcimg"):
    print(x)

If I place the sample data from your question in a file named data.xml and run the above code, I see as output:

<Element srcimg at 0x7f0501353380>
<Element srcimg at 0x7f05013533c0>
<Element srcimg at 0x7f0501353400>
<Element srcimg at 0x7f0501353440>
<Element srcimg at 0x7f0501353480>
<Element srcimg at 0x7f0501353500>
<Element srcimg at 0x7f0501353540>
<Element srcimg at 0x7f0501353580>
<Element srcimg at 0x7f05013535c0>
larsks
  • 277,717
  • 41
  • 399
  • 399
  • The `print(x)` prints nothing – Onur-Andros Ozbek Feb 23 '22 at 02:34
  • If I copy and paste the sample data from your question into a file named `data.xml` and run this code, it produces output. If your sample data doesn't match your actual data, that could produce difference results. – larsks Feb 23 '22 at 02:43
  • It prints everything except `print(x)`. I've added several print statements to confirm. Apparently its not seeing `/fakeroot/trainingdata/scrimg` – Onur-Andros Ozbek Feb 23 '22 at 02:56
  • The code in this answer produces the output shown in this answer when using the sample data from your question. Here's a complete demo: https://asciinema.org/a/g8LvfnKIkDQlJRuK7iK52OA0g – larsks Feb 23 '22 at 03:34
  • I've updated the data to what it actually looks like.. – Onur-Andros Ozbek Feb 23 '22 at 03:55