2

I'm trying to parse a number of xml files that only sometimes have xmlns set. Is there any way to determine whether it's set w/o using the lxml library?

My main issue is when finding elements using find or findall, nothing is returned if the namespace is set since the tag doesn't match. But I can't hardcode the namespace in because sometimes there is no namespace set. I don't really know how to go about this.

Here's a sample of some of my code

 tree = ET.parse(xml_file_path)
 root = tree.getroot() #ONIXmessage
 ...
 pids = product.findall("productidentifier")
 ...

So my main issue is with the findall() method

Thanks.

user
  • 105
  • 2
  • 9

2 Answers2

0

It's kind of a pain, but you could use local-name() in your XPath.

For example, instead of:

/foo/bar/baz

try:

/*[local-name()='foo']/*[local-name()='bar']/*[local-name()='baz']
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • This looks right in principle, but if the OP must use ElementTree, then it won't work (this module only supports [a limited subset of XPath](https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax)). It should work with [lxml](http://lxml.de) which does support full XPath (1.0). – mzjn Apr 11 '14 at 15:41
0

I will shortly be having this problem/question too. My thought was: use a wrapper function that first tries to get the elements without the namespace specified, and if that returns None, then try with the namespace. If both return None, then the elements were not present. Using both functions (without if-else) works nicely if no default namespace is provided.

If the choice is between the same namespace either being specified or not, then I think ths approach above is okay. If you have multiple-optional-namespaces, it will make your wrapper more complicated but it's a one-time effort.

Would like to see a more elegant solution for this though. Did DanielHaley's answer work?

Related options:

Community
  • 1
  • 1
aneroid
  • 12,983
  • 3
  • 36
  • 66
  • Wasn't quite sure how to use Daniel's solution, so I ended up having a wrapper function to add the namespace if present. I grab the namespace using regex. Probably not the most ideal solution. But it seems to work for now. Thanks for your help! – user Apr 10 '14 at 21:28