1

This is actually a follow-up question to my previous one but I've iterated my problem so maybe this is easier to solve. I have XML data in following format:

<v1:publications xmlns:commons="v3.commons.pure.atira.dk"
             xmlns:v1="v1.publication-import.base-uk.pure.atira.dk">
<v1:book id="1" subType="book">
    <v1:peerReviewed>true</v1:peerReviewed>
    <v1:publicationCategory>scientific</v1:publicationCategory>
    <v1:publicationStatus>published</v1:publicationStatus>
    <v1:language>fi</v1:language>
    <v1:title>
        <commons:text>Introduction to scientific reduction</commons:text>
    </v1:title>
    <v1:abstract/>
    <v1:persons>
        <v1:author>
            <v1:role>author</v1:role>
            <v1:person>
                <v1:firstName>Jane</v1:firstName>
                <v1:lastName>Smith</v1:lastName>
            </v1:person>
        </v1:author>
    </v1:persons>
    <v1:organisations>
        <v1:organisation id="2250500"/>
    </v1:organisations>
    <v1:owner id="2250500"/>
    <v1:publicationDate>
     <commons:year>2013</commons:year>
  </v1:publicationDate>
    <v1:visibility>Public</v1:visibility>
    <v1:numberOfPages>2</v1:numberOfPages>
</v1:book>
<v1:book id="1" subType="book">
    <v1:persons>
        <v1:author>
            <v1:role>author</v1:role>
            <v1:person>
                <v1:firstName>John</v1:firstName>
                <v1:lastName>Doe</v1:lastName>
            </v1:person>
        </v1:author>
    </v1:persons>
    <v1:organisations>
        <v1:organisation id="220300"/>
    </v1:organisations>
    </v1:book>
</publications>

The XSLT I've so far is this:

<?xml version="1.0"?>
<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:commons="v3.commons.pure.atira.dk"
xmlns:v1="v1.publication-import.base-uk.pure.atira.dk"
exclude-result-prefixes="xsi xs"
version="2.0">

 <xsl:output method="xml" indent="yes" />
 <xsl:output omit-xml-declaration="yes" indent="yes"/> 
 <xsl:strip-space elements="*"/>

<xsl:template match="/">
<v1:publications>
  <xsl:for-each-group select="/v1:publications/v1:book" group-by="@id">
    <xsl:for-each-group select="current-group()" group-by="if(@Key) then @Key else 'no key'">
    <v1:book>  
        <!-- Copy attributes off the *first* GroupData element in the group -->
        <xsl:copy-of select="current-group()[1]/@*"/>

        <!-- Copy ItemData children from *all* GroupData elements in the group -->

         <xsl:copy-of select="current-group()/*" />

      </v1:book>
    </xsl:for-each-group>
  </xsl:for-each-group>
</v1:publications>

Problem is that it creates separate nodes under <v1:book> for duplicates (v1:persons) when I would like to combine them like this:

<v1:persons>
   <v1:author></v1:author>
   <v1:author></v1:author>
</v1:persons>

Fields like <v1:title/> I could easily remove from the XML beforehand so they are not a problem.

Desired output should be like the following, I edited few fields (organisation id and owner id to correct ones). This is actual data that imports correctly.

<?xml version="1.0" encoding="UTF-8"?>
<v1:publications xmlns:commons="v3.commons.pure.atira.dk"
             xmlns:v1="v1.publication-import.base-uk.pure.atira.dk">
<v1:book id="1" subType="book">
    <v1:peerReviewed>true</v1:peerReviewed>
    <v1:publicationCategory>scientific</v1:publicationCategory>
    <v1:publicationStatus>published</v1:publicationStatus>
    <v1:language>fi_FI</v1:language>
    <v1:title>
        <commons:text>Introduction to scientific reduction</commons:text>
    </v1:title>
    <v1:persons>
        <v1:author>
            <v1:role>author</v1:role>
            <v1:person>
                <v1:firstName>Jane</v1:firstName>
                <v1:lastName>Smith</v1:lastName>
            </v1:person>
        </v1:author>
        <v1:author>
            <v1:role>author</v1:role>
            <v1:person>
                <v1:firstName>John</v1:firstName>
                <v1:lastName>Die</v1:lastName>
            </v1:person>
        </v1:author>
    </v1:persons>
    <v1:organisations>
        <v1:organisation id="2250500"/>
        <v1:organisation id="2250300"/>
    </v1:organisations>
    <v1:owner id="2250300"/>
    <v1:publicationDate>
     <commons:year>2013</commons:year>
  </v1:publicationDate>
    <v1:visibility>Public</v1:visibility>
    <v1:numberOfPages>2</v1:numberOfPages>
</v1:book>
</v1:publications>
  • Can you insert the actual value(s) you want in your desired result? As it is, it's unclear what data you want there (although we can guess). (Minor note: no need to add "Update: ..". Edits to a question should leave the question 'stand alone'.) (Also, both your sample XML and XSLT need a few closing elements.) – Jongware May 25 '15 at 10:57
  • @Jongware Thanks for the input. I added actual desired data above. It validates correctly when imported. – Anthony Berkins May 26 '15 at 08:44

2 Answers2

0

Assuming the desired output is

<v1:persons>
    <v1:author>
        <v1:fullName>John Doe</v1:fullName>
    </v1:author>
</v1:persons>

you need to replace the <xsl:copy-of>, which makes a perfect copy (and so does not allow changes within), with a per-item copy template.

The general identity template match="@*|node() will match everything (see https://stackoverflow.com/a/617611/2564301), but this will be overridden by the specific template for a match="v1:author". This simply writes out the values of <v1:firstName> and <v1:lastName> with a single space in between.

For consistency, I added <xsl:copy> to copy the <v1:author> tag itself, plus <xsl:apply-templates select="@*"/> to copy all of its attributes. That way it will also work with an element such as

<v1:author id='1'>

– the attribute will be copied along as expected.

<xsl:template match="/">
<v1:publications>
  <xsl:for-each-group select="/v1:publications/v1:book" group-by="@id">
    <xsl:for-each-group select="current-group()" group-by="if(@Key) then @Key else 'no key'">
    <v1:book>  
        <!-- Copy attributes off the *first* GroupData element in the group -->
        <xsl:apply-templates select="current-group()[1]/@*"/>

        <!-- Copy ItemData children from *all* GroupData elements in the group -->
         <xsl:apply-templates select="current-group()/*" />

      </v1:book>
    </xsl:for-each-group>
  </xsl:for-each-group>
</v1:publications>
</xsl:template>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="v1:author">
    <xsl:copy>
        <xsl:apply-templates select="@*"/>
        <v1:fullName>
            <xsl:value-of select="v1:person/v1:firstName" />
            <xsl:text> </xsl:text>
            <xsl:value-of select="v1:person/v1:lastName" />
        </v1:fullName>
    </xsl:copy>
</xsl:template>

This makes a faithful copy of the input but with the <v1:persons> section replaced by

  <v1:persons>
     <v1:author id="1">
        <v1:fullName>Jane Smith</v1:fullName>
     </v1:author>
     <v1:author id="2">
        <v1:fullName>Bob Sandurz</v1:fullName>
     </v1:author>
  </v1:persons>

(I added a second name and attributes for testing.)

Community
  • 1
  • 1
Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Than you, a learned a lot from your example. But problem for me is that what I get is: Jane Smith And separate GBob Sandurz When what I need is Jane SmithBob Sandurz – Anthony Berkins May 25 '15 at 11:49
  • @AnthonyBerkins: ... `` was not mentioned in your question. But it's an easy fix: you are free to insert any (matching!) set of tags at any point in the output. – Jongware May 25 '15 at 12:28
  • Indeed I can, and that helps in future. But the problem persists that authors go under separate tags, not under the same one. – Anthony Berkins May 25 '15 at 12:49
  • @AnthonyBerkins: hm... *All* of the duplicate tags are repeated in the output. If that is not what you want, you may want to get rid of the `copy` of multiple books into one at all – the "Copy ItemData children from *all* GroupData elements" part. – Jongware May 25 '15 at 13:08
0

Just as different approach, only copying the authors of the book with the same id without concatenating first and last name - following XSLT

<?xml version="1.0"?>
<xsl:stylesheet 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:commons="v3.commons.pure.atira.dk"
 xmlns:v1="v1.publication-import.base-uk.pure.atira.dk"
 exclude-result-prefixes="xsi xs"
 version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:output omit-xml-declaration="yes" indent="yes"/> 
<xsl:strip-space elements="*"/>
  <xsl:template match="/">
    <v1:publications>
      <xsl:for-each-group select="/v1:publications/v1:book" group-by="@id">  
        <xsl:apply-templates select="."/>
      </xsl:for-each-group>
    </v1:publications>
  </xsl:template>
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="v1:person">
  <xsl:variable name="id" select="ancestor::v1:book/@id"/>
    <xsl:copy-of select="//v1:person[ancestor::v1:book[@id=$id]]"/>
  </xsl:template>
</xsl:stylesheet>

when applied to your input XML produces the output (relevant part)

<v1:persons>
     <v1:author>
        <v1:role>author</v1:role>
        <v1:person>
           <v1:firstName>Jane</v1:firstName>
           <v1:lastName>Smith</v1:lastName>
        </v1:person>
        <v1:person>
           <v1:firstName>John</v1:firstName>
           <v1:lastName>Doe</v1:lastName>
        </v1:person>
     </v1:author>
  </v1:persons>

The template matching v1:person copies all v1:person nodes that are children of a book with the same id.
Saved Demo with an additional second book with a different id.

matthias_h
  • 11,356
  • 9
  • 22
  • 40