1

I am struggling with a simple task. The following XML file

<Root>
    <Row>
        <ConceptID>1</ConceptID>
        <Concept>may be empty</Concept>
        <TermID>2481</TermID>
        <Term>screened room</Term>
        <Language>EN</Language>
        <Usage>forbidden</Usage>
        <StatusLanguage>new</StatusLanguage>
        <Source>HEKT385057</Source>
    </Row>
    <Row>
        <ConceptID>1</ConceptID>
        <Concept>may be empty</Concept>
        <TermID>6551</TermID>
        <Term>shielded room</Term>
        <Language>EN</Language>
        <Usage>allowed</Usage>
        <StatusLanguage>new</StatusLanguage>
        <Source>EKT-TD</Source>
    </Row>
    <Row>
        <ConceptID>1</ConceptID>
        <Concept>may be empty</Concept>
        <TermID>6552</TermID>
        <Term>unverseuchter Raum</Term>
        <Language>DE</Language>
        <Usage>allowed</Usage>
        <StatusLanguage>new</StatusLanguage>
        <Source>EKT-40</Source>
    </Row>
    <Row>
        <ConceptID>2</ConceptID>
        <Concept>may be also empty</Concept>
        <TermID>2482</TermID>
        <Term>low-pressure ventilator</Term>
        <Language>EN</Language>
        <Usage>allowed</Usage>
        <StatusLanguage>new</StatusLanguage>
        <Source>Birgit</Source>
    </Row>
    <Row>
        <ConceptID>2</ConceptID>
        <Concept>may be also empty</Concept>
        <TermID>2483</TermID>
        <Term>LP ventilator</Term>
        <Language>EN</Language>
        <Usage>allowed</Usage>
        <StatusLanguage>new</StatusLanguage>
        <Source>HEKT385057</Source>
    </Row>
...
</Root>

I wish to transform into a new XML file with following structure and grouping (ConceptID):

<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <NewConcept>
      <ConceptID>1</ConceptID>
      <Concept>may be empty</Concept>
      <TermG>
         <TermID>6551</TermID>
         <Term>shielded room</Term>
         <Language>EN</Language>
         <Usage>allowed</Usage>
         <StatusLanguage>new</StatusLanguage>
         <Source>EKT-TD</Source>
      </TermG>
      <TermG>
         <TermID>6552</TermID>
         <Term>unverseuchter Raum</Term>
         <Language>DE</Language>
         <Usage>allowed</Usage>
         <StatusLanguage>new</StatusLanguage>
         <Source>EKT-40</Source>
      </TermG>
      <TermG>
         <TermID>2481</TermID>
         <Term>screened room</Term>
         <Language>EN</Language>
         <Usage>forbidden</Usage>
         <StatusLanguage>new</StatusLanguage>
         <Source>HEKT385057</Source>
      </TermG>
   </NewConcept>
   <NewConcept>
      <ConceptID>2</ConceptID>
      <Concept>may be also empty</Concept>
      <TermG>
         <TermID>2482</TermID>
         <Term>low-pressure ventilator</Term>
         <Language>EN</Language>
         <Usage>allowed</Usage>
         <StatusLanguage>new</StatusLanguage>
         <Source>Birgit</Source>
      </TermG>
      <TermG>
         <TermID>2483</TermID>
         <Term>LP ventilator</Term>
         <Language>EN</Language>
         <Usage>allowed</Usage>
         <StatusLanguage>new</StatusLanguage>
         <Source>HEKT385057</Source>
      </TermG>
   </NewConcept>
...
</Root>

my current XSL file however only copies the tags into the desired structure but not the content

    <xsl:key name="concept" match="Row" use="ConceptID" />
     <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="Row[generate-id(.)=generate-id(key('concept',ConceptID)[1])]">
            <xsl:sort select="ConceptID" data-type="number"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="Row">
        <NewConcept>
            <xsl:apply-templates select="ConceptID" />
            <xsl:apply-templates select="Concept" />
            <xsl:for-each select="key('concept', ConceptID)">
            <xsl:sort select="Usage"/>
                <TermG>     
                    <xsl:apply-templates select="TermID" />
                    <xsl:apply-templates select="Term" />
                    <xsl:apply-templates select="Language" />
                    <xsl:apply-templates select="Usage" />
                    <xsl:apply-templates select="StatusLanguage" />
                    <xsl:apply-templates select="Source" />
                </TermG>
            </xsl:for-each>
        </NewConcept>
    </xsl:template>

yields into:

<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <NewConcept>
      <ConceptID/>
      <Concept/>
      <TermG>
         <TermID/>
         <Term/>
         <Language/>
         <Usage/>
         <StatusLanguage/>
         <Source/>
      </TermG>
      <TermG>
         <TermID/>
         <Term/>
         <Language/>
         <Usage/>
         <StatusLanguage/>
         <Source/>
      </TermG>
      <TermG>
         <TermID/>
         <Term/>
         <Language/>
         <Usage/>
         <StatusLanguage/>
         <Source/>
      </TermG>
   </NewConcept>
...
</Root>

Replacing


<xsl:apply-templates select="Row[generate-id(.)=generate-id(key('concept',ConceptID)[1])]">
    <xsl:sort select="ConceptID" data-type="number"/>
</xsl:apply-templates>

with

<xsl:apply-templates select="@*|node()"/>

gives me the correct output (structure and content), however the groups appear multiple times, depending on how man elements are in a group (three elements for example results in three times the same group). I would very much appreciate a hint that helps me to solve this task! Thank you very much.

MikeH
  • 25
  • 6
  • Please explain the logic you are trying to implement. Why does your expected output contain only the first group by `ConceptID`? – michael.hor257k Feb 01 '21 at 10:27
  • The XML file represents a Terminology Database. A Concept consists of Terms, multiple Terms, either in a different language or allowed and forbidden Terms. So the first Concept with the ID=1 contains three Terms, which I want to group. Language, Usage, Status and Source are attributes of that particular Term and I want to keep them under a new tag called TermG – MikeH Feb 01 '21 at 10:43
  • This does not answer my question. – michael.hor257k Feb 01 '21 at 10:45
  • Sorry then I do not understand your question. I did not post the complete XML file. My output also of course contains ID2 and all following ones, I only omitted them in the post – MikeH Feb 01 '21 at 10:47
  • Okay now I added also the second ID – MikeH Feb 01 '21 at 10:59

3 Answers3

2

Don't modify the identity template.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" encoding="utf-8" />

    <xsl:key name="kRowByConceptID" match="Row" use="ConceptID" />

     <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="Root">
        <xsl:copy>
            <xsl:apply-templates select="Row[
                generate-id() = generate-id(key('kRowByConceptID', ConceptID))
            ]">
                <xsl:sort select="ConceptID" data-type="number"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="Row">
        <NewConcept>
            <xsl:apply-templates select="ConceptID" />
            <xsl:apply-templates select="Concept" />
            <xsl:apply-templates select="key('kRowByConceptID', ConceptID)" mode="TermG">
                <xsl:sort select="Usage" />
            </xsl:apply-templates>
        </NewConcept>
    </xsl:template>
    
    <xsl:template match="Row" mode="TermG">
        <TermG>     
            <xsl:apply-templates select="TermID" />
            <xsl:apply-templates select="Term" />
            <xsl:apply-templates select="Language" />
            <xsl:apply-templates select="Usage" />
            <xsl:apply-templates select="StatusLanguage" />
            <xsl:apply-templates select="Source" />
        </TermG>        
    </xsl:template>
</xsl:stylesheet>

produces

<Root>
  <NewConcept>
    <ConceptID>1</ConceptID>
    <Concept>may be empty</Concept>
    <TermG>
      <TermID>6551</TermID>
      <Term>shielded room</Term>
      <Language>EN</Language>
      <Usage>allowed</Usage>
      <StatusLanguage>new</StatusLanguage>
      <Source>EKT-TD</Source>
    </TermG>
    <TermG>
      <TermID>6552</TermID>
      <Term>unverseuchter Raum</Term>
      <Language>DE</Language>
      <Usage>allowed</Usage>
      <StatusLanguage>new</StatusLanguage>
      <Source>EKT-40</Source>
    </TermG>
    <TermG>
      <TermID>2481</TermID>
      <Term>screened room</Term>
      <Language>EN</Language>
      <Usage>forbidden</Usage>
      <StatusLanguage>new</StatusLanguage>
      <Source>HEKT385057</Source>
    </TermG>
  </NewConcept>
  <NewConcept>
    <ConceptID>2</ConceptID>
    <Concept>may be also empty</Concept>
    <TermG>
      <TermID>2482</TermID>
      <Term>low-pressure ventilator</Term>
      <Language>EN</Language>
      <Usage>allowed</Usage>
      <StatusLanguage>new</StatusLanguage>
      <Source>Birgit</Source>
    </TermG>
    <TermG>
      <TermID>2483</TermID>
      <Term>LP ventilator</Term>
      <Language>EN</Language>
      <Usage>allowed</Usage>
      <StatusLanguage>new</StatusLanguage>
      <Source>HEKT385057</Source>
    </TermG>
  </NewConcept>
</Root>
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Thanks for your solution. What you all achieved is coming up with a solution so quickly. A test run with my complete database created just what I hope I could achieve! And I like the application of the Muench's method (which I still do not completely understand, yet) – MikeH Feb 01 '21 at 11:50
  • @MikeH I've deliberately created a verbose solution, so that the individual steps stand on their own. There are ways to compress the code, as Michael's solution demonstrates. If you want to read more about how grouping works, read [an old answer of mine](https://stackoverflow.com/a/955527/18771) where I give an explanation of `` using JavaScript as an example. – Tomalak Feb 01 '21 at 12:09
  • Thanks for your explanation. And I like your solution in terms to understand the steps. And thanks for the link. I feel I somehow could grab it, but then get confused again when a new task comes up. – MikeH Feb 01 '21 at 12:15
  • @MikeH You'll get the hang of it, it's not difficult once you've wrapped your head around it. – Tomalak Feb 01 '21 at 12:27
  • @TomaIak, I believe so too… – MikeH Feb 01 '21 at 12:49
2

Muenchian grouping - which is what you're trying to implement here - has 2 parts:

  1. Creating a group for each distinct value;
  2. Populating the group with nodes that have the same value.

You are doing the 1st part almost correctly here:

<xsl:apply-templates select="Row[generate-id(.)=generate-id(key('concept',ConceptID)[1])]">

I say "almost" because you are doing this in a template that matches any node/attribute, which makes no sense. You only want to do the grouping once.

OTOH, you make no effort to implement the 2nd part.

Here is how you could get the expected result simply and shortly:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="concept" match="Row" use="ConceptID" />

<xsl:template match="/Root">
    <Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <!-- create a group for each distinct ConceptID -->
        <xsl:for-each select="Row[generate-id()=generate-id(key('concept', ConceptID)[1])]">
            <xsl:sort select="ConceptID" data-type="number"/>
            <NewConcept>
                <xsl:copy-of select="ConceptID | Concept"/>
                <!-- populate the group with rows with the current ConceptID -->
                <xsl:for-each select="key('concept', ConceptID)">
                    <xsl:sort select="Usage"/>
                    <TermG>
                        <xsl:copy-of select="*[not(self::ConceptID or self::Concept)]"/>
                    </TermG>
                </xsl:for-each>
            </NewConcept>
        </xsl:for-each>
    </Root>
</xsl:template>

</xsl:stylesheet>
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thank you for your solution. Muench's approach for me is the way to go. And simplicity the other one. And it is done with a single template only. – MikeH Feb 01 '21 at 11:58
-1

Here's my solution. For me, using variables is easier than keys. Hope this helps you out.

<xsl:stylesheet version='1.0' xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

<xsl:variable name='keyconcept' select='/Root/Row[not(ConceptID=preceding-sibling::Row/ConceptID)]'/>
<xsl:variable name='allconcept' select='/Root/Row'/>

<xsl:template match='/'>
    <Root>
        <xsl:for-each select='$keyconcept'>
            <xsl:variable name='conceptid' select='ConceptID'/>
            <NewConcept>
                <xsl:copy-of select='ConceptID'/>
                <xsl:copy-of select='Concept'/>
                <xsl:for-each select='$allconcept[ConceptID = $conceptid]'>
                    <TermG>
                        <xsl:copy-of select='Language'/>
                        <xsl:copy-of select='Usage'/>
                        <xsl:copy-of select='StatusLanguage'/>
                        <xsl:copy-of select='Source'/>
                    </TermG>
                </xsl:for-each>
            </NewConcept>
        </xsl:for-each>
    </Root>

</xsl:template>
</xsl:stylesheet>
William Walseth
  • 2,803
  • 1
  • 23
  • 25
  • Using variables for grouping is a lot slower than using keys. For small inputs this won't be noticeable, for large inputs it can become an issue. – Tomalak Feb 01 '21 at 11:39
  • Thanks for your quick input.The template works with the exception that the ConceptID tag always contains the namespace xmlns:xsl. Using Muench's method in comparison, the solution here requires considerably more time for the transformation – MikeH Feb 01 '21 at 11:46
  • Row[generate-id()=generate-id(key('concept', ConceptID)[1])] makes my head hurt. – William Walseth Feb 01 '21 at 15:21
  • FWIW I ran a couple comparisons based on up to 100,000 "Row" elements. Muenchian grouping is faster, in the general case. Sometimes a LOT faster. The variable approach really performs significantly worse when a there are a lot of unique values in the key. In the special case where there are just a few unique values (like 10 or 20), the variable approach is just as fast, no matter how many rows (even up to 100,000 "Rows"). That case is pretty rare, so I agree Muenchian is probably worth the headache : ) – William Walseth Feb 01 '21 at 17:55