7

I have a huge XML-formatted configuration file. The system doesn't care about the order of tags, but we humans do! (Primarily for the purpose of version comparisons.) I already received the XSLT below which works well, but I've discovered that it's not quite enough.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates>
      <xsl:sort select="(@name, name())[1]"/>
    </xsl:apply-templates>
  </xsl:copy>
</xsl:template>
</xsl:stylesheet>

I want to sort all tags recursively by the value of their name attribute (this works!) but because the attribute is not always present, it must also sort by further attributes, any of which may or may not be present in any given element.

I have basically zero understanding of XSLT so I'm experimenting. I've hacked the above into this, but it doesn't work as desired. The result of this seems to be identical to the above.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates>
      <xsl:sort select="@name"/>
      <xsl:sort select="@row"      data-type="number"/>
      <xsl:sort select="@col"      data-type="number"/>
      <xsl:sort select="@sequence" data-type="number"/>
      <xsl:sort select="@tabindex" data-type="number"/>
    </xsl:apply-templates>
  </xsl:copy>
</xsl:template>
</xsl:stylesheet>

My data looks similar to this, and the problem is that the cell elements are not sorted at all (within their grid group) because they have no name attribute. This is why I'd like to extend the sorting logic to use name attribute when present, else the sort should be done using additional attributes like tabindex. Within any given group, the same attributes can be assumed to be present.

<sections>
  <section name="SomeList">
    <caption>
      <![CDATA[Candidates]]>
    </caption>
    ...
    <parameters>
      <parameter name="pageSize">
        <![CDATA[50]]>
      </parameter>
    </parameters>
    ... 
    <grid>
      <cell row="0" col="7" tabindex="9" colspan="10">
        <field name="Entered" />
      </cell>
    </grid>
  </section>
</sections>

Update:
With Vincent's very good help, I've created a sorting that works well enough for our purposes. Here it is.

Community
  • 1
  • 1
Torben Gundtofte-Bruun
  • 2,104
  • 1
  • 24
  • 34

3 Answers3

4

It is a response that assumes that you don't have any mixed content in your data. It only takes into account the two first steps (@name and @col), you can adapt for further steps. Maybe it can be rewritten with a recursive named template that takes the list of your sorting param as input. Could you provide an XML sample if my XSLT don't work for you.

XSLT 2.0 sample :

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
        <xsl:template match="*">
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:for-each-group select="*" group-by="if (exists(@name)) then @name else ''">
                    <xsl:sort select="current-grouping-key()" data-type="text"/>
                    <xsl:for-each-group select="current-group()" group-by="if (exists(@row)) then @row else -1">
                        <xsl:sort select="current-grouping-key()" data-type="number"/>
                        <xsl:apply-templates select="current-group()"/>
                    </xsl:for-each-group>
                </xsl:for-each-group>
            </xsl:copy>
        </xsl:template>
</xsl:stylesheet>

Note that the code iterates on a group with same values, so that if an attribute is not present on elements, the elements are grouped together.

I take as Input the following XML :

<?xml version="1.0" encoding="UTF-8"?>
<items>
    <item row="5" col="9"></item>
    <item name="d" row="20" col="12" tabindex="" sequence=""></item>
    <item row="1" col="5" ></item>
    <item name="d" row="5" col="6" ></item>
    <item name="a" row="7" col="8" ></item>
    <item name="s" row="1" col="5" ></item>
    <item name="c" row="5" col="9"></item>
    <item row="2" col="5" ></item>
    <item row="20" col="9"></item>
    <item row="0" col="9"></item>
    <item name="s" row="2" col="10" tabindex="" sequence=""></item>
    <item name="z" row="8" col="15" tabindex="" sequence=""></item>    
</items>

I have the following result :

<?xml version="1.0" encoding="UTF-8"?>
<items>
   <item row="0" col="9"/>
   <item row="1" col="5"/>
   <item row="2" col="5"/>
   <item row="5" col="9"/>
   <item row="20" col="9"/>
   <item name="a" row="7" col="8"/>
   <item name="c" row="5" col="9"/>
           <item name="d" row="5" col="6"/>
   <item name="d" row="20" col="12" tabindex="" sequence=""/>
   <item name="s" row="1" col="5"/>
   <item name="s" row="2" col="10" tabindex="" sequence=""/>
   <item name="z" row="8" col="15" tabindex="" sequence=""/>
</items>
Vincent Biragnet
  • 2,950
  • 15
  • 22
  • The output you mention looks exactly the way I want it! Unfortunately, I don't get your sample output when I run your sample input through your sample XSLT. [I get this.](http://pastebin.com/Mxq3wKeC) Looks like I need to troubleshoot on my side first! – Torben Gundtofte-Bruun Nov 25 '11 at 11:00
  • For much clarity in the output, maybe add the following instruction as first child of the stylesheet : . You will have the same indented result as above. And try running from command line : java -jar saxon.jar test.xml test.xsl . Let me know about what you got as output. – Vincent Biragnet Nov 25 '11 at 11:11
  • As I see your XML, I was wondering if each different element (parameter, section, cell...) has the same attributes always present ? May be there is only a different kind of sorting for each element name, that is something quite easier than the general task. – Vincent Biragnet Nov 25 '11 at 11:15
  • Re indent: The `indent` trick helped; I was using HTML Tidy as well and that breaks things, but I think just using the `indent` will be perfect. – Torben Gundtofte-Bruun Nov 25 '11 at 11:18
  • Re attributes: Our XML has different blocks, and _within_ each block, the attributes are the same, but differ between blocks. E.g. all elements in a `grid` block have the same kind of attributes (row/col/tabindex), but these are different from elements in a `parameters` block (name). The trouble is that we have 50+ different kinds of blocks, so a generic solution seems better than having to set up 50+ individual rules. – Torben Gundtofte-Bruun Nov 25 '11 at 11:20
  • But you will have a lot of unecessary imbrications for the different attributes. – Vincent Biragnet Nov 25 '11 at 11:29
1

Consider this XSLT for soecific elements with given mandatory attributes :

   <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes"/>
    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates>
                <xsl:sort select="(@name, name())[1]"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="grid">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:for-each-group select="*" group-by="if (exists(@row)) then @row else -1">
                <xsl:sort select="current-grouping-key()" data-type="number"/>
                <xsl:for-each-group select="current-group()" group-by="if (exists(@col)) then @col else -1">
                    <xsl:sort select="current-grouping-key()" data-type="number"/>
                    <xsl:apply-templates select="current-group()"/>
                </xsl:for-each-group>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

My example must cover sections and parameters sorting with the first template matching *. And also grid sorting by row and col. You can extend for any orther elements that has different sorting attributes by duplicating the template.

If you've got several elements for the same attributes, use match="elt1|elt2|elt3".

Vincent Biragnet
  • 2,950
  • 15
  • 22
1

Here is a generic, simple and not long (60 well-formatted lines) solution.

Sorting is performed on all wanted attributes and this doesn't require any manual duplication of templates:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:my="my:my" exclude-result-prefixes="my">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:param name="pSortTypes" as="element()*">
      <attr name="name" type="alpha" maxLength="15"/>
      <attr name="row" type="numeric" maxLength="6"/>
      <attr name="col" type="numeric" maxLength="4"/>
      <attr name="tabindex" type="numeric" maxLength="2"/>
      <attr name="sequence" type="numeric" maxLength="3"/>
    </xsl:param>

 <xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates select="*">
     <xsl:sort select="my:OrderedAttributeTuple(.)"/>
    </xsl:apply-templates>
  </xsl:copy>
 </xsl:template>

 <xsl:function name="my:OrderedAttributeTuple" as="xs:string">
  <xsl:param name="pElem" as="element()"/>

  <xsl:variable name="vResult" as="xs:string*">
      <xsl:apply-templates select="$pSortTypes">
       <xsl:with-param name="pElem" select="$pElem"/>
      </xsl:apply-templates>
  </xsl:variable>

  <xsl:sequence select="string-join($vResult, '')"/>
 </xsl:function>

 <xsl:template match="attr">
  <xsl:param name="pElem" as="element()"/>

  <xsl:variable name="vVal" select=
       "string($pElem/@*[name() eq current()/@name])"/>

  <xsl:variable name="vPad" as="xs:string*" select=
   "for $cnt in xs:integer(@maxLength) - string-length($vVal),
        $i in 1 to $cnt
     return '.'
   "/>

   <xsl:variable name="vPadding" select="string-join($vPad, '')"/>

   <xsl:variable name="vTuple">
       <xsl:sequence select=
        "if(@type eq 'alpha')
           then concat($vVal, $vPadding)
           else concat($vPadding, $vVal)
        "/>
    </xsl:variable>

   <xsl:sequence select="string($vTuple)"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on this XML document:

<items>
    <item row="5" col="9"/>
    <item name="d" row="20" col="12" tabindex="" sequence=""/>
    <item row="1" col="5" />
    <item name="d" row="5" col="6" />
    <item name="a" row="7" col="8" />
    <item name="s" row="1" col="5" tabindex="3" sequence="4"/>
    <item name="s" row="3" col="3" tabindex="3" sequence="4"/>
    <item name="c" row="5" col="9"/>
    <item row="2" col="5" />
    <item row="20" col="9"/>
    <item row="0" col="9"/>
    <item name="s" row="3" col="3" tabindex="1" sequence="2"/>
    <item name="s" row="2" col="10" tabindex="1" sequence="2"/>
    <item name="z" row="8" col="15" tabindex="" sequence=""/>
</items>

the wanted, correctly sorted result is produced:

<items>
   <item row="0" col="9"/>
   <item row="1" col="5"/>
   <item row="2" col="5"/>
   <item row="5" col="9"/>
   <item row="20" col="9"/>
   <item name="a" row="7" col="8"/>
   <item name="c" row="5" col="9"/>
   <item name="d" row="5" col="6"/>
   <item name="d" row="20" col="12" tabindex="" sequence=""/>
   <item name="s" row="1" col="5" tabindex="3" sequence="4"/>
   <item name="s" row="2" col="10" tabindex="1" sequence="2"/>
   <item name="s" row="3" col="3" tabindex="1" sequence="2"/>
   <item name="s" row="3" col="3" tabindex="3" sequence="4"/>
   <item name="z" row="8" col="15" tabindex="" sequence=""/>
</items>

Do note:

  1. Sorting is performed on all attributes specified in an external parameter ($pSortTypes). Compare this to the currently accepted answer, which only sorts on @name and @row and requires hardcoding of the order and sort data-type.

  2. The exact wanted sorting order of the attributes can be specified. It is their order as in $pSortTypes.

  3. The sort data-type for each attribute is specified in the type attribute in $pSortTypes (currently just "alpha" and "numeric")

  4. The maximum length of the string representation of the value of an attribute is specified as the maxLength attribute in $pSortTypes. This is used for correct padding/alignment and also increases the sorting efficiency.

  5. This demonstrates how to solve even the most complicated sorting problems by having a user-defined xsl:function (in this case my:OrderedAttributeTuple()) that generates a single sort key.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431