2

I have a xml file where I need to combine an element's values together to one element and make sure there are no duplicates. Below is the input xml file.

           <AIRPORTSFILE>
           <document name="SAMPLE1">
                 <DEPARTURE_AIRPORT>D1</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-03-15</DEPARTURE_DATE>
                 <DEPARTURE_TIME>0615</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-03-14</ARRIVAL_DATE>
                 <ARRIVAL_TIME>0930</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>A1</ARRIVAL_AIRPORT>

                 <DEPARTURE_AIRPORT>D2</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-03-14</DEPARTURE_DATE>
                 <DEPARTURE_TIME>0615</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-03-15</ARRIVAL_DATE>
                 <ARRIVAL_TIME>0930</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>A2</ARRIVAL_AIRPORT>

                 <DEPARTURE_AIRPORT>D2</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-03-15</DEPARTURE_DATE>
                 <DEPARTURE_TIME>0615</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-03-15</ARRIVAL_DATE>
                 <ARRIVAL_TIME>0930</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>A2</ARRIVAL_AIRPORT>
          </document>


          <document name="SAMPLE2">
                 <DEPARTURE_AIRPORT>2014-06-05</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-06-05</DEPARTURE_DATE>
                 <DEPARTURE_TIME>1815</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-06-05</ARRIVAL_DATE>
                 <ARRIVAL_TIME>2130</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>P1</ARRIVAL_AIRPORT>

                 <DEPARTURE_AIRPORT>2014-06-06</DEPARTURE_AIRPORT>
                 <DEPARTURE_DATE>2014-06-06</DEPARTURE_DATE>
                 <DEPARTURE_TIME>1815</DEPARTURE_TIME>
                 <ARRIVAL_DATE>2014-06-05</ARRIVAL_DATE>
                 <ARRIVAL_TIME>2130</ARRIVAL_TIME>
                 <ARRIVAL_AIRPORT>P1</ARRIVAL_AIRPORT>
          </document>
          </AIRPORTSFILE>

The output needs to be:

         <catalog>
         <document name="SAMPLE1">
                <departureDate>2014-03-15,2014-03-14</departureDate>
                <arrivalAirport>A1,A2</arrivalAirport>
         </document>
         <document name="SAMPLE2">
                <departureDate>2014-06-05,2014-06-06</departureDate>
                <arrivalAirport>P1</arrivalAirport>
         </document>
         </catalog>

I have looked at XSLT 1.0 - Remove Duplicate Nodes From Variable and XSLT 1.0 - Remove duplicates fields for some reference, but cannot get it to work properly.

Below is what I have in my xsl 1.0 file to get DEPARTURE_DATE to work.

<xsl:key name="kDepartureDate" match="DEPARTURE_DATE" use="."/>


<xsl:template match="@* | node()" name="Copy">
   <xsl:copy>
     <xsl:apply-templates select="@* | node()"/>
   </xsl:copy>
 </xsl:template>

<xsl:template match="DEPARTURE_DATE[generate-id() = 
                           generate-id(key('kDepartureDate', .)[1])]"  name="depDateCopy">
    <xsl:call-template name="Copy" />
</xsl:template>

<xsl:template match="AIRPORTSFILE">
    <catalog>
        <xsl:for-each select="document">
        <xsl:variable name="departureDate">
                <xsl:call-template name="depDateCopy"></xsl:call-template>
        </xsl:variable>
        </xsl:for-each>
     </catalog>
</xsl:template>

Any help will be much appreciated.

Community
  • 1
  • 1
Raj
  • 23
  • 4
  • The most interesting part about your XSLT code is the presence of the `` element in the template matching `AIRPORTSFILE`. – michael.hor257k Aug 28 '14 at 02:06
  • catalog is the root element that I want in the output XML. Can you help me with removing the duplicates? – Raj Aug 28 '14 at 02:59
  • I want catalog. But that doesn't make a difference to the reason the duplicates are not being removed does it? – Raj Aug 28 '14 at 03:25
  • What makes you think they are not being removed? – michael.hor257k Aug 28 '14 at 03:59
  • @michael.hor257k This is the result I am getting for one document: D1 2014-03-15 0615 2014-03-14 0930 A1 D2 2014-03-14 0615 2014-03-15 0930 A2 D2 2014-03-15 0615 2014-03-15 0930 A2 ALC,ALC,PFO – Raj Aug 28 '14 at 04:29
  • No, that is **NOT** the result you are getting. The result returned by your XSLT is this: ``. I don't see any duplicates here - do you? – michael.hor257k Aug 28 '14 at 06:09
  • Where is the definition of the named templates mentioned in your code? – Mathias Müller Aug 28 '14 at 08:33

1 Answers1

0

Your current code looks so complicated and long-winded to me that I think it's best to start from scratch. And by that I mean starting with thinking about how to address the problem.

These are the steps you need to follow in order to solve your problem. (Or let's say, it is one way of solving it).

  • Write a template that matches AIRPORTSFILE and output a catalog element in its stead. Apply templates to the content.
  • Write a template that matches document and copies it.

For the content of document:

  • Copy all the attributes of document
  • Introduce an element departureDate and find all elements DEPARTURE_DATE that have distinct values (using a key). Copy their text content. Output a comma if the current element is not the last one.
  • Introduce an element arrivalAirport and repeat the above.

This is kind of a pseudocode written in a way that is easy to reproduce with actual XSLT.

Stylesheet

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <xsl:strip-space elements="*"/>

    <xsl:key name="dep-date" match="DEPARTURE_DATE" use="."/>
    <xsl:key name="arr-air" match="ARRIVAL_AIRPORT" use="."/>

    <xsl:template match="AIRPORTSFILE">
      <catalog>
          <xsl:apply-templates/>
      </catalog>
    </xsl:template>

    <xsl:template match="document">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <departureDate>
                <xsl:for-each select="DEPARTURE_DATE[count(. | key('dep-date', .)[1]) = 1]">
                    <xsl:value-of select="."/>
                    <xsl:if test="position() != last()">
                        <xsl:text>,</xsl:text>
                    </xsl:if>
                </xsl:for-each>
            </departureDate>
            <arrivalAirport>
                <xsl:for-each select="ARRIVAL_AIRPORT[count(. | key('arr-air', .)[1]) = 1]">
                    <xsl:value-of select="."/>
                    <xsl:if test="position() != last()">
                        <xsl:text>,</xsl:text>
                    </xsl:if>
                </xsl:for-each>
            </arrivalAirport> 
        </xsl:copy>
    </xsl:template>

</xsl:transform>

XML Output

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
   <document name="SAMPLE1">
      <departureDate>2014-03-15,2014-03-14</departureDate>
      <arrivalAirport>A1,A2</arrivalAirport>
   </document>
   <document name="SAMPLE2">
      <departureDate>2014-06-05,2014-06-06</departureDate>
      <arrivalAirport>P1</arrivalAirport>
   </document>
</catalog>
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
  • @michael.hor257k I did not intend to punish someone. Edited my answer, changed the stylesheet to work with keys, instead of the `following-sibling::` axis. – Mathias Müller Aug 28 '14 at 13:53
  • Thank you @Mathias Müller. This seems to work. The only thing I saw is that the second document's arrivalAirport is empty. If you look at the input xml file. Both the values in the second document is P1. I am assuming the for-each needs to be modified a bit. Thanks for your help though! – Raj Aug 28 '14 at 15:39
  • @Raj I don't see anything that needs modification. (Click [here](http://xsltransform.net/3NzcBsM) to see for yourself). In your input XML, there is only "P1", and the same in my output. Please consider accepting this question (mark the tick on the left) if it was helpful to you. – Mathias Müller Aug 28 '14 at 15:42
  • @MathiasMüller that is weird. I had added an additional document where the arrival Airports were PF and they were empty. Anyway, thank you so much for your help! – Raj Aug 28 '14 at 16:17
  • @MathiasMüller Sorry to keep bothering you, but if you look at this transform, http://xsltransform.net/3NzcBsM/5 where the arrival airport value is three characters, then the second document is not able to pick up the arrival airport info. – Raj Aug 28 '14 at 16:46
  • @MathiasMüller I think I see the issue. So the documents can have the same arrival airport and departure date values. I think the key goes through all the values in the document and picks just the first instance of it. I want the ability of having duplicates removed for each document but the other documents can also have the same values. – Raj Aug 28 '14 at 17:00
  • @Raj I see your problem now. You want the unique values per document, not in the whole XML input. Either use `xsl:for-each-group` of XSLT 2.0 or use the first version of my answer (see the edit history). – Mathias Müller Aug 28 '14 at 18:07