6

I have xml which looks something like this -

<Root>
  <Fields>
    <Field name="abc" displayName="aaa" />
    <Field name="pqr" displayName="ppp" />
    <Field name="abc" displayName="aaa" />
    <Field name="xyz" displayName="zzz" />
  </Fields>
</Root>

I want the output to contain only those elements which have a repeating name-displayName combination, if there are any -

<Root>
      <Fields>
        <Field name="abc" displayName="aaa" />
        <Field name="abc" displayName="aaa" />
      </Fields>
</Root>

How can I do this using XSLT?

Unmesh Kondolikar
  • 9,256
  • 4
  • 38
  • 51

2 Answers2

9

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kFieldByName" match="Field"
  use="concat(@name, '+', @displayName)"/>

 <xsl:template match=
  "Field[generate-id()
        =
         generate-id(key('kFieldByName',
                     concat(@name, '+', @displayName)
                     )[2])
        ]
  ">
     <xsl:copy-of select=
     "key('kFieldByName',concat(@name, '+', @displayName))"/>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<Root>
    <Fields>
        <Field name="abc" displayName="aaa" />
        <Field name="pqr" displayName="ppp" />
        <Field name="abc" displayName="aaa" />
        <Field name="xyz" displayName="zzz" />
    </Fields>
</Root>

produces the wanted result:

<Field name="abc" displayName="aaa"/>
<Field name="abc" displayName="aaa"/>

Explanation:

  1. Muenchian grouping using composite key (on the name and displayName attributes).

  2. The only template in the code matches any Field element that is the second in its corresponding group. Then, inside the body of the template, the whole group is output.

  3. Muenchian grouping is the efficient way to do grouping in XSLT 1.0. Keys are used for efficiency.

  4. See also my answer to this question.

II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:for-each-group select="/*/*/Field"
          group-by="concat(@name, '+', @displayName)">
       <xsl:sequence select="current-group()[current-group()[2]]"/>
   </xsl:for-each-group>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document (shown above), again the wanted, correct result is produced:

<Field name="abc" displayName="aaa"/>
<Field name="abc" displayName="aaa"/>

Explanation:

  1. Use of <xsl:for-each-group>

  2. Use of the current-group() function.

Community
  • 1
  • 1
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
1

To find duplicates, you need to iterate the Field elements and for each one, look for the set of Field elements in the whole document that have matching name and displayName attribute values. If the set has more than 1 element, you add that element into the output.

Here is an example of a template that achieves this:

<xsl:template match="Field">
    <xsl:variable name="fieldName" select="@name" />
    <xsl:variable name="fieldDisplayName" select="@displayName" />
    <xsl:if test="count(//Field[@name=$fieldName and @displayName=$fieldDisplayName]) > 1">
        <xsl:copy-of select="."/>
    </xsl:if>
</xsl:template>

Executing this template (wrapped in an appropriate XSLT file) on your sample data gives the following output:

<?xml version="1.0" encoding="utf-8"?>
<Root>
  <Fields>
    <Field name="abc" displayName="aaa" />
    <Field name="abc" displayName="aaa" />
  </Fields>
</Root>
Jeff Yates
  • 61,417
  • 20
  • 137
  • 189
  • @Jeff Yates: This is one possible solution, however its efficiency is O(N^2) and it is too slow to be used on XML documents with a large number of `Field` elements. See my answer for an efficient solution. – Dimitre Novatchev May 09 '11 at 13:21
  • @Dimitre: Seems silly to do more effort than necessary. There is no reason to believe the real XML would be huge and there is no profiling information. I'd go for quick to write over quick to run any day until the profiling is in. – Jeff Yates May 09 '11 at 13:28
  • @Jeff Yates: One can and should use the known most-efficient solutions. Because people think otherwise we encounter everyday's problems about a transformation running 40 minutes and when refactored with Muenchian grouping then taking only 2 seconds. We should not propagate bad and naive algorithms. – Dimitre Novatchev May 09 '11 at 13:33
  • @Dimitre: You are right although one should also consider the cost of implementation and maintenance when optimizing up front. – Jeff Yates May 09 '11 at 13:38
  • @Jeff Yates: This cost is zero when the algorithm is well-known and implemented many times in the past. – Dimitre Novatchev May 09 '11 at 14:26
  • 1
    While the efficiency might be O(N^2) on many XSLT processors, it might be much better on an optimizing processor - try it on Saxon-EE. However, I agree it's best not to place too heavy a reliance on the optimizer - use xsl:for-each-group. – Michael Kay May 09 '11 at 15:46