5

I need to remove duplicates in the following xml:

<ListOfRowIDWithListOfBooks xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/">
  <RowIDWithListOfBooks>
    <Row_ID>ADOA-XssK</Row_ID>
    <ListOfBookInfo>
      <book>
        <BookType>Brand</BookType>
        <BookName>jon</BookName>
      </book>
      <book>
        <BookType>Brand</BookType>
        <BookName>jon</BookName>
      </book>
    </ListOfBookInfo>
  </RowIDWithListOfBooks>
</ListOfRowIDWithListOfBooks>

Can anybody help?

Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
Steph
  • 51
  • 1
  • 2

4 Answers4

6

This task can be easily achieved using standard grouping solutions. Do not use single select statements to do that which are well known to cause performance problems.

Note The reference to identity.xsl just include into the stylesheet the well known identity transformation template.

[XSLT 1.0]

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:key name="k-books" match="book" use="concat(BookType,'|',BookName)"/>

    <xsl:include href="identity.xsl"/>

    <xsl:template match="ListOfBookInfo">
        <ListOfBookInfo>
            <xsl:copy>
                <xsl:apply-templates select="book
                [generate-id()
                =generate-id(key('k-books',concat(BookType,'|',BookName))[1])]"/>
            </xsl:copy>
        </ListOfBookInfo>
    </xsl:template>

</xsl:stylesheet>

[XSLT 2.0]

<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:include href="identity.xsl"/>

    <xsl:template match="ListOfBookInfo">
        <ListOfBookInfo>
            <xsl:for-each-group select="book" 
                group-by="concat(BookType,'|',BookName)">
                <xsl:apply-templates select="."/>
            </xsl:for-each-group>
        </ListOfBookInfo>
    </xsl:template>

</xsl:stylesheet>
Emiliano Poggi
  • 24,390
  • 8
  • 55
  • 67
  • you should replace your `xsl:include` with the actual identity transform. That way your stylesheet will work for someone who has no idea what the identity transform looks like. – Daniel Haley Jul 22 '11 at 00:48
  • @DevNull, that's what exactly `xsl:include` does ;-) I've better included a note with a reference. Thanks for your feedback. – Emiliano Poggi Jul 22 '11 at 04:29
3

Try this XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/">
    <xsl:apply-templates select="//ListOfBookInfo/book[not(BookType = preceding-sibling::book/BookType
                         and BookName = preceding-sibling::book/BookName)]"/>
  </xsl:template>

  <xsl:template match="book">
    <xsl:copy-of select="."/>
  </xsl:template>

</xsl:stylesheet>

It will select books with unique BookType and BookName. In your sample result should be:

<book xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/">
        <BookType>Brand</BookType>
        <BookName>jon</BookName>
      </book>
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
0

You need to group them together using the Muenchian grouping method. Or the more specific grouping functions in xslt 2.0. Here are two relevant stack overflow questions:

How to use group by in xslt

How to output duplicate elements using XSLT?

Community
  • 1
  • 1
Bronumski
  • 14,009
  • 6
  • 49
  • 77
  • I need the final result to look like this and I just don't understand :( ADOA-XssK Brand jon – Steph Jul 20 '11 at 21:20
  • I don't think grouping is going to do what I need:>( It's all in the select --- no matter how i do it, it comes down to not knowing what to with the select – Steph Jul 20 '11 at 22:08
  • Honestly, if you're stuck with XSLT 1.0, you're better off running it through an XML parser first. – hoodaticus Jul 20 '11 at 23:54
0

If you interested in how this is achieved using Muenchian Grouping, which is a common technique in XSLT, you first need to define a 'key' to identify duplicate books within a row.

<xsl:key 
   name="books"
   match="book"
   use="concat(concat(../../Row_ID, '#'), concat(concat(BookType, '#'), BookName))" />

In this I am achieving this using a concatenated key of RowID, BookType and BookName. The key will contain a list of books all with that particular value of key. Do note the use of the # character as the joining character. If there is any chance of # appearing in your XML, you will need to pick another character (or string).

Now when you are matching on book elements, you can check for duplicates like so

<xsl:variable 
  name="lookup"
  select="concat(concat(../../Row_ID, '#'), concat(concat(BookType, '#'), BookName))" />
<xsl:if test="generate-id() = generate-id(key('books', $lookup)[1])">

In other words, is this book element the first element in our key.

Here is the full XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:output method="xml" indent="yes"/>
   <xsl:key 
      name="books"
      match="book"
      use="concat(concat(../../Row_ID, '#'), concat(concat(BookType, '#'), BookName))"/>

   <xsl:template match="book">
      <xsl:variable name="lookup" select="concat(concat(../../Row_ID, '#'), concat(concat(BookType, '#'), BookName))"/>
      <xsl:if test="generate-id() = generate-id(key('books', $lookup)[1])">
         <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
         </xsl:copy>
      </xsl:if>
   </xsl:template>

   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

Also note the use of the identity transform so that other nodes can be copied without having to explicitly reference them. When this XSLT is applied to your input, the following output is generated:

<RowIDWithListOfBooks xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/">
   <Row_ID>ADOA-XssK</Row_ID>
   <ListOfBookInfo>
      <book>
         <BookType>Brand</BookType>
         <BookName>jon</BookName>
      </book>
   </ListOfBookInfo>
</RowIDWithListOfBooks>

EDIT: I have amended the XSLT to remove an unnecessary template match.

Tim C
  • 70,053
  • 14
  • 74
  • 93