1

I want to transform Third-Party xml files to a csv file with xmlstarlet. Some of the files use default namespace with a xmlns declaration, others use default namespace without xmlns declaration and other use a strict default namespace in most precise interpretation.

Here are smaller files that should clarify my problem.
foo1.xml

<?xml version="1.0"?>
<root xmlns="http://my.namespace" xmlns:fooNS="http://foo.namespace" xmlns:barNS="http://bar.namespace">
    <fooNS:foo id="1">FOO 1</fooNS:foo>
    <fooNS:foo id="2">FOO 2</fooNS:foo>
    <barNS:bar ref="2" unitRef="Unit1">2000</barNS:bar>
    <unit id="Unit1">
        <measure>bars</measure>
    </unit>
</root>

foo2.xml:

<?xml version="1.0"?>
<root xmlns:fooNS="http://foo.namesapece" xmlns:barNS="http://bar.namespace">
    <fooNS:foo id="1">FOO 1</fooNS:foo>
    <fooNS:foo id="2">FOO 2</fooNS:foo>
    <barNS:bar ref="2" unitRef="Unit1">2000</barNS:bar>
    <unit id="Unit1">
        <measure>bars</measure>
    </unit>
</root>

foo3.xml

<?xml version="1.0"?>
<myNS:root xmlns:myNS="http://my.namespace" xmlns:fooNS="http://foo.namesapece" xmlns:barNS="http://bar.namespace">
    <fooNS:foo id="1">FOO 1</fooNS:foo>
    <fooNS:foo id="2">FOO 2</fooNS:foo>
    <barNS:bar ref="2" unitRef="Unit1">2000</barNS:bar>
    <unit id="Unit1">
        <measure>bars</measure>
    </unit>
</myNS:root>

Now I want a file with "FOO 2 | 2000 | bars" as output. Attribute "unitRef" is defined as IDREF in the xsd.

This command works for foo1.xml (but NOT for foo2.xml and foo3.xml):

$> xmlstarlet sel -N xbrli="http://my.namespace" \
         -t -m "//fooNS:foo[../barNS:bar/@ref = @id]"
         -v . -o " | " \
         -v "../barNS:bar[@ref=current()/@id]" -o " | " \
         -v \
"//xbrli:unit[@id=current()/../barNS:bar[@ref=current()/@id]/@unitRef]/xbrli:measure" \
         -n foo1.xml

And this command works for foo2.xml AND foo3.xml (but NOT for foo1.xml):

$> xmlstarlet sel -N xmlns="http://my.namespace" \
         -t -m "//fooNS:foo[../barNS:bar/@ref = @id]" \
         -v . -o " | " \
         -v "../barNS:bar[@ref=current()/@id]" -o " | " \
         -v \
"//unit[@id=current()/../barNS:bar[@ref=current()/@id]/@unitRef]/measure" \
         -n foo[23].xml

Question: is there a syntax that works for all three third-party files ? If not with xmlstarlet, then maybe with a xslt file? Or maybe it's possible to process all xml file (with xmlstarlet or xslt) so that they act similarly?

Thanks.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
kdg1955
  • 305
  • 2
  • 12
  • XMLStarlet v1.2.1 and newer has `_:` syntax for the default namespace (see first duplicate link) but your XML files are not all equivalent as resolution of the namespace prefix names and default namespaces do not result in the same element names. (**Make sure you understand the XML namespace differences between your three files before you try to write XPath expressions against them.**) If you truly want to disregard namespaces (generally not recommended), you can test against `local-name()`. See second duplicate link for further details. – kjhughes Jul 12 '18 at 16:47
  • @kjhughes . Thanks. The problem is that I work with third party xml files where the namespaces are declared differently. I'm looking for a syntax that works for the three xml files. If I use the `_:` syntax, it only works for **foo1.xml**, NOT for foo2.xml and foo3.xml. None of the listed answers give an answer to my problem. – kdg1955 Jul 12 '18 at 18:47
  • Be aware that the namespace prefixes themselves only are significant in their binding to namespace URIs; the actual prefix used isn't significant. (That said, your XML files are still not equivalent.) Does `*[local-name() = 'foo']` work for you then to disregard the namespace on `foo`? – kjhughes Jul 12 '18 at 18:54
  • @kjhughes . Thanks again. – kdg1955 Jul 12 '18 at 19:06
  • You're welcome, but are you saying you have resolved your problem, or do you need further help -- wasn't clear to me. – kjhughes Jul 12 '18 at 19:12
  • Sorry push to quickly on enter! It's works with `local-name()`. But you mentioned already it's **disregarded**. If nobody comes with a better solution, I will use it. Too bad I can not post it as an answer, because I think this is not a duplicate. So the (disregarded) answer `-v "//*[local-name()='unit'][@id=current()/../barNS:bar[@ref=current()/@id]/@unitRef]/*[local-name()='measure']"`. Nevertheless, thanks again. – kdg1955 Jul 12 '18 at 19:12
  • Ignoring namespaces is generally discouraged because namespaces contribute a vital component to XML component names and aid in namespace management. I'll re-open the question if you'd like to be able to post a self-answer or receive other answers beyond those in the duplicates links. Please cite [How does XPath deal with namespaces](https://stackoverflow.com/q/40796231/290085) if you do post a solution using `local-name()`. – kjhughes Jul 12 '18 at 19:17
  • Up to you. If someone comes with a better solution .. then maybe yes. I appreciate your help. – kdg1955 Jul 12 '18 at 19:20

1 Answers1

2

Because nobody comes with a better solution, I used the solution proposed by #kjhughes.

So the (not recommended) answer : -v "//[local-name()='unit'][@id=current()/../barNS:bar[@ref=current()/@id]/@unitRef]/[local-name()='measure']"

See also "How does XPath deal with XML namespaces?" for a description of all recommended solutions. But non of them works for all my cases.

kdg1955
  • 305
  • 2
  • 12