2

I have some XML where I would like to remove identical consecutive child nodes, which are in different parents. That is, if a child (in different parents) node my XML tree appears two times or more consecutively, I want to remove all the duplicates.

The duplicate nodes I'm thinking of are the <child>a</child> in the first two <parent> nodes.

An example:

Here is the source XML:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>a</child>
      <child>bb</child>
      <child>cc</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

   <parent>
      <child>a</child>
      <child>bbbb</child>
      <child>cccc</child>
   </parent>

</root>

Here is the desired XML:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>bb</child>
      <child>cc</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

   <parent>
      <child>a</child>
      <child>bbbb</child>
      <child>cccc</child>
   </parent>

</root>

Only one element is removed but if there were, for example, 5 consecutive <child>a</child> nodes at the beginning (instead of 2), four of them would be removed. I'm using XSLT 2.0.

I appreciate any help.

Follow-Up:

Thanks to Kirill I get the documents I want, however this has spawned a new problem that I didn't anticipate, if I have an XML document like this:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

</root>

And I apply Kirill's XSLT, I get this:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

</root>

How can I also remove the <parent> </parent>? For my application there may be other subelements of <parent>, which are OK to remove if there is no <child> element in the <parent> element.

A solution I have, that I don't like, is to apply another transform after the first one. This only works when applied in order though and I need a separate XSLT file and need to run two commands instead of one.

Here it is:

 <xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="node() | @*"/>
    </xsl:copy>
 </xsl:template>

 <xsl:template match="parent[not(child)]"/>
devin
  • 6,407
  • 14
  • 48
  • 53

3 Answers3

3

Use:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

  <xsl:template match="child[../preceding-sibling::parent[1]/child = .]"/>

</xsl:stylesheet>
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
2

If you're able to use XSLT 2.0, the problem is solved as follows:

<xsl:for-each-group select="parent" group-adjacent="child[1]">
  <xsl:for-each select="current-group()">
    <parent>
      <xsl:if test="position()=1">
        <xsl:copy-of select="current-group()[1]/child[1]"/>
      </xsl:if>
      <xsl:copy-of select="current-group()/child[position() gt 1]"/>
    </parent>
  </xsl:for-each>
</xsl:for-each-group>
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • yeah I'm using saxon, which I think is the only XSLT 2.0 parser (to my knowledge at least) – devin Nov 11 '11 at 21:23
  • Wouldn't this only deal with the first child of each `parent` node? From what I read of the question, if the second `child` node was `b`, this should also be removed. – Flynn1179 Nov 12 '11 at 00:21
  • @Flynn1179: This is exactly why I asked in a comment the OP to define the problem correctly. I think he completely deserves the votes to close the question, as well as the -1. – Dimitre Novatchev Nov 12 '11 at 03:48
  • 2
    @Dimitre: Are you serious? You actually believe it deserved to be closed just because you didn't understand it? Wow. Putting aside from the fact that myself and Kirill both understood it just fine, you really need to realise that some problems are inevitably difficult to describe; not being an expert user isn't justification for even a downvote, much less a vote to close. – Flynn1179 Nov 13 '11 at 15:47
  • @Flynn1179: You'd be right hadn't there been more than one questions to this user to clarify and his absolute unwillingness/failure to do so. For example: did he reply to *your* question and did he specify if nodes should be compared only if they have the same "position within their parents"? Without this defined this question doesn't make sense. We are not asking puzzles, we are asking *questions*. – Dimitre Novatchev Nov 13 '11 at 15:57
0

This answers the newly added followup question:

How can I also remove the <parent> </parent>? For my application there may be other subelements of <parent>, which are OK to remove if there is no <child> element in the element.

This transformation is an add-on to Kirill's and accomplishes the desired cleanup of the would-be resulting empty parent elementwithout the need of a second pass:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="child[../preceding-sibling::parent[1]/child = .]"/>

  <xsl:template match=
  "parent
     [not(child
          [not(. = ../preceding-sibling::parent[1]
                                              /child
               )
           ]
          )
     ]"/>
</xsl:stylesheet>

when applied to the provided XML document:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

</root>

the wanted, correct result is produced:

<root>
  <parent>
    <child>a</child>
    <child>b</child>
    <child>c</child>
  </parent>
  <parent>
    <child>aaa</child>
    <child>bbb</child>
    <child>ccc</child>
  </parent>
</root>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431