0

(Edit) As per advice from @michael.hor257k I have also included the verbose form of the input XML.

I have an input XML that I can't control, and need to transform it in XSLT. Please note that unlike many "related answers" on SO, in this case, the "depth" level is not known as an attribute of each item, but needs to be calculated from the path.

Here is an XSLT fiddle of my problem: http://xsltransform.net/6rexjhn/3

Here is the a simplified form of the input (compact form):

<?xml version="1.0" encoding="UTF-8"?>
<data>
    <settings>
        <field style="attribute" level="one.quality" target="one.quality">high</field>
        <field style="attribute" level="one.weight" target="one.weight">10 kg</field>
        <field style="element" level="one_two" target="two">A</field>
        <field style="attribute" level="one_three.color" target="three.color">black</field>
        <field style="element" level="one_three_four" target="four" >B</field>
        <field style="attribute" level="one_three_four.length" target="four.length">12 cm</field>
        <field style="attribute" level="one_three_four.width" target="four.width"> 7 cm</field>
        <field style="element" level="one_three_five" target="five" >C</field>
        <field style="attribute" level="one_six.size" target="six.size" >large</field>
        <field style="element" level="one_six_seven_eight" target="eight">D</field>
        <field style="element" level="one_nine" target="nine">E</field>
    </settings>
</data>

And here is the verbose form:

<?xml version="1.0" encoding="UTF-8"?>
<data>
    <settings>
        <field style="element" level="one" target="one"></field>
        <field style="attribute" level="one.quality" target="one.quality">high</field>
        <field style="attribute" level="one.weight" target="one.weight">10 kg</field>
        <field style="element" level="one_two" target="two">A</field>
        <field style="element" level="one_three" target="three"></field>
        <field style="attribute" level="one_three.color" target="three.color">black</field>
        <field style="element" level="one_three_four" target="four" >B</field>
        <field style="attribute" level="one_three_four.length" target="four.length">12 cm</field>
        <field style="attribute" level="one_three_four.width" target="four.width"> 7 cm</field>
        <field style="element" level="one_three_five" target="five" >C</field>
        <field style="element" level="one_six" target="six" ></field>
        <field style="attribute" level="one_six.size" target="six.size" >large</field>
        <field style="element" level="one_six_seven" target="seven" ></field>
        <field style="element" level="one_six_seven_eight" target="eight">D</field>
        <field style="element" level="one_nine" target="nine">E</field>
    </settings>
</data>

The flattening is such that (1) an underscore represents a child element of the target, (2) a period is for the attribute, (3) Max depth isn't known but should be reasonable and (4) Elements that have children have only children elements, no stand-alone values. This is what I would like to get:

<?xml version="1.0" encoding="UTF-8"?>
<template xmlns="http://www.example.org/standards/template/1"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xmlns:ac="http://www.example.org/Standards/abc/1"
          xsi:schemaLocation="http://www.example.org/standards/template.xsd"
          Version="2022-01">
  <one quality="high" weight="10 kg">
    <two>A</two>
    <three color="black">
      <four length="12 cm" width="7 cm">B</four>
      <five>C<five>
    </three>
    <six size="large">
      <seven>
        <eight>D</eight>
      <seven>
    </six>
    <nine>E</nine>
  </one>
</template>

Based on this SO answer that also doesn't have a depth attribute, here is what I have tried (using XSLT 1.0). A lot of the data is missing, and I can't figure out how to handle the attributes.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" doctype-public="XSLT-compat" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:key name="siblings" match="*[not(self::field)]" use="generate-id(preceding-sibling::field[1])" />
<xsl:key name="nextlevel" match="field" use="generate-id(preceding-sibling::field[@level][starts-with(current(), concat(., '_'))][1])" />

<xsl:template match="/">
    <template xmlns="http://www.example.org/standards/template/1" 
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    xmlns:ac="http://www.example.org/Standards/abc/1"
                    xsi:schemaLocation="http://www.example.org/standards/template.xsd"
                    Version="2022-01">
        <one quality="{//field[@target='one.quality']}" weight="{//field[@target='one.weight']}">
            <xsl:apply-templates select="//field[@level='one']" />
        </one>
    </template>
</xsl:template>

<!-- Fetch elements -->
<xsl:template match="//field[@style='element']">
     <xsl:element name="{@target}">
        <xsl:apply-templates select="key('siblings', generate-id())" />
       <xsl:apply-templates select="key('nextlevel', generate-id())" />
    </xsl:element>
</xsl:template>

<!-- identity copy transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

</xsl:transform>

This is what I am getting using the XSLT above:

<?xml version="1.0" encoding="UTF-8"?>
<template xmlns="http://www.example.org/standards/template/1"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xmlns:ac="http://www.example.org/Standards/abc/1"
          xsi:schemaLocation="http://www.example.org/standards/template.xsd"
          Version="2022-01">
   <one quality="high" weight="10 kg"/>
</template>
Cogicero
  • 1,514
  • 2
  • 17
  • 36
  • Which XSLT 1.0 processor do you target? Any EXSLT tokenize or similar support to break up those values like `level="one_three_four.length"`? – Martin Honnen May 10 '22 at 20:50
  • I didn't ask for any platform specific extension abilities just whether there is some way to tokenize strings, libxslt with exslt support has that, Xalan (at least Java) has that. What does `system-property('xsl:vendor')` say? – Martin Honnen May 10 '22 at 20:57
  • 1
    See here how you can identify the processor: https://stackoverflow.com/a/25245033/3016153 – michael.hor257k May 10 '22 at 21:29
  • 1
    The logic that needs to be applied here is not at all clear to me. Please explain, step by step, how you would this manually. – michael.hor257k May 10 '22 at 21:31
  • @michael.hor257k Thank you. Vendor is `Microsoft` and version is `1`. This is similar to the example I linked, except that attributes are involved and delimiter is underscore. How to do this manually: `one` is always the highest-level parent element under `template`. The attributes of one are picked up from `one.quality` and `one.weight`. Within the tag, I will loop items using level as a guide. If item style=`element`, I create new element with name provided by `target`. If item style="attribute", with target value `element.attrib`, I add the attribute `attrib` to the element `element`. – Cogicero May 10 '22 at 21:43
  • 1
    IIUC, you have attributes that have no parent element (e.g. `three`) and you need to somehow create this element - is this correct? Or is this an omission? What determines which elements are named explicitly and which need to be derived from their perspective attributes? – michael.hor257k May 10 '22 at 22:02
  • @michael.hor257k Yes, one, three, six and seven are simply containers of other elements, they have no text data (unlike others that either have text, or an attribute in text form). The others are named explicitly because they contain text data. It’s a compact representation coming from the other end. AFAIK I think there is a verbose option they can send, which also explicitly includes rows that have no text data. But this a simplified example, the actual one is a lot, so I am using the compact form to save space. Would it make the problem easier to approach if every item was expressly defined? – Cogicero May 10 '22 at 22:24
  • 1
    I think it's very likely (I would need to see an example to be sure). Currently you have to not only surmise the existence of an element from its attributes' paths, but also group these partial paths so that you avoid creating a separate element for each of its attributes. This is a lot of work, esp. with your processor. – michael.hor257k May 10 '22 at 23:22
  • @michael.hor257k Thanks! I added the verbose form. Same explanation as before, but probably easier to see now. The underscores indicate a child element e.g. the level of `one_nine` means `nine` is a direct child of `one`. If there are any attributes for an element, those ones come right after. Pls do you have any hints to debug the XSLT? – Cogicero May 10 '22 at 23:34
  • @Cogicero It's not a matter of "easier to see". Your original XML sample and your new one are fundamentally different. Knowing whether the "non-verbose" form can actually occur has a lot of impact on the solution. – Tomalak May 11 '22 at 07:26

1 Answers1

2

It seems to me that given your "verbose" input, you could achieve the expected result quite easily using:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.example.org/standards/template/1" >
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="elem" match="field[@style='element']" use="substring-before(@level, @target)" />
<xsl:key name="attr" match="field[@style='attribute']" use="substring-before(@level, '.')" />

<xsl:template match="/">
    <template xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org/standards/template.xsd"           xmlns:ac="http://www.example.org/Standards/abc/1" Version="2022-01">
        <xsl:apply-templates select="key('elem', '')" />
    </template>
</xsl:template>

<xsl:template match="field[@style='element']">
    <xsl:element name="{@target}">
        <xsl:apply-templates select="key('attr', @level)" />
        <xsl:value-of select="." />
        <xsl:apply-templates select="key('elem', concat(@level, '_'))" />
    </xsl:element>
</xsl:template>

<xsl:template match="field[@style='attribute']">
    <xsl:attribute name="{substring-after(@target, '.')}">
        <xsl:value-of select="." />
    </xsl:attribute>
</xsl:template>

</xsl:stylesheet>

Note that there is no hard-coding of any node names, and the hierarchy depth is unlimited.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thank you so much @michael.hor257k! Sounds like I have to go with the verbose form, then. The data potentially runs into hundreds of thousands of rows, and the XML can be sparse (some deeply nested levels), so the compact form would have been data-saving. I’ll look into whether I can have XSLT to convert the compact form into the verbose but for now this is amazing! – Cogicero May 11 '22 at 07:31
  • 2
    Since the compact form has all the required information, and XSLT 1.0 is Turing-complete, it should be possible to produce the same result using the compact form as the input. I just don't see a way to do it without an awful amount of work. Converting the compact form into verbose would require essentially the same effort. – michael.hor257k May 11 '22 at 08:04
  • Thank you for your patience and kindness, but I greatly simplified the question above. Before this, I had a "splitter" template used to split items in a list, but after applying your answer my splitter no longer works. So I have created a new question which should get me all the way - if you have any further insights, pls let me know! https://stackoverflow.com/questions/72207414/clone-nodes-while-expanding-flat-to-hierarchical-xml-using-microsoft-xslt-1-0 – Cogicero May 11 '22 at 20:46
  • 1
    I have looked at it, but I don't see an elegant way to handle it. It too will require a lot of work. Could you not ask your data provider to use a more convenient format? Using a delimited string for multiple values is especially reprehensible; the proper way to use XML is to make use of the provided structure of elements and attributes. – michael.hor257k May 12 '22 at 04:32
  • Thank you - but unfortunately that's not an option here. I have made a lot of progress by banging my head against the wall and doing some trial and error. I am so close to being done now, but I am still missing something about how to fetch substrings. If you have the time to help me take a look I would really appreciate it! I updated the question linked above, and for your convenience, here is the fiddle: https://xsltfiddle.liberty-development.net/6qtiBn6/2 – Cogicero May 12 '22 at 22:08