0

Im trying to follow some of the other XML parsing questions already posted here. But it seems that my xml is somewhat weird. Im trying to parse https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/primary.xml

I tried to do something like:

url = 'https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/primary.xml'

opener = urllib.request.build_opener()
tree = etree.parse(opener.open(url))
root = tree.getroot()

for child in root:
        print(child.tag, child.attrib)

But this gets such a line for every child: {http://linux.duke.edu/metadata/common}package {'type': 'rpm'}

I dont get why the child's tag includes the "{http://linux.duke.edu/metadata/common}" part.

embedded
  • 105
  • 1
  • 1
  • *"I dont get why the child's tag includes the "{http://linux.duke.edu/metadata/common}" part."* - Because the elements are in the `http://linux.duke.edu/metadata/common` namespace, and that prefix is ElementTree's way of telling you. (Before you ask, no, you can't get rid of it.) Ask the question you really want to ask. – Tomalak Sep 08 '20 at 08:16
  • Ok thanks. Well the question would be, how do i iterate over the package elements? Want i really need of that xml is the location URL. I tried to get it with `for location in root.iter('location')` but that dont seem to work – embedded Sep 08 '20 at 08:41
  • One of the ways: You can prefix the namespace, just as ElementTree does. There are uncounted examples on this site alone how to work with a default namespace (that's how this is called) in ElementTree, take a look around. – Tomalak Sep 08 '20 at 08:47

0 Answers0