I want to remove duplicate nodes from a XML structure

Question

I want to remove the duplicate nodes from below XML using XSLT. I am suning XSLT to transform a XML to another XML structure. How can I get the desired result?

I have this piece of code that's being generated from an application and it is going into a different application. So the data which is coming from the source application its containing some redundant nodes as in the below example. So I have to place the transformed XML to the folder which will be consumed by the target application

Please ask a **specific** question about a difficulty you encountered when trying to accomplish this. Otherwise it looks like you're just looking for someone to write your code for you. Also reduce the example to the minimum necessary to demonstrate the problem - see: [mcve]. — michael.hor257k, May 17 '21 at 13:00
I have tried using multiple things, but failing in getting the required result — Soumya.r, May 17 '21 at 13:05
And here you could find some first ideas: https://stackoverflow.com/questions/17862028/remove-duplicate-nodes-from-xml-file-using-xsl — Siebe Jongebloed, May 17 '21 at 13:28
@SiebeJongebloed The question is tagged as `xslt-2.0`. Why are you referring to an XSLT 1.0 answer? — michael.hor257k, May 17 '21 at 13:30
@michael.hor257k: not only as xslt-2.0...but ok: here another one: https://stackoverflow.com/questions/10912544/removing-duplicate-elements-with-xslt?noredirect=1&lq=1 — Siebe Jongebloed, May 17 '21 at 13:33
You source now is not well-formed anymore. Could it be that the root needs to have some kind of namespace like in the target — Siebe Jongebloed, May 17 '21 at 13:42
Please note that any sensitive data can still be viewed by anyone (even visitors who are not logged in) by clicking on the "Edited" link. To remove it from most eyes, use the "Delete" option. That's still going to be visible to some higher level moderators, I believe, so you may want to Flag it for a moderator to see if someone higher up the chain can "really delete" it for you. — NotTheDr01ds, Jun 05 '21 at 06:42
OP, editors, please do not vanadalize the post. If there is sensitive data, flag for mod attention and explain. Anything that is posted on SO is covered with CC BY-SA: https://creativecommons.org/licenses/by-sa/4.0/ — Vega, Jun 05 '21 at 07:20
Please see: [I've thought better of my question; can I delete it?](/help/what-to-do-instead-of-deleting-question) in the help center. We support removal and redaction of truly sensitive information. However, we don't support wholesale destruction of a question and/or the associated answers. You should work to edit the question in such a way which removes the sensitive information, but retains the substance and subtlety of the question. Once done, you can flag to ask for redaction. Editing the information out of answers, and redacting, is permitted, but the answers must retain their usefulness. — Makyen, Jun 05 '21 at 08:11

score 1 · Answer 1 · edited Jun 24 '21 at 06:19

1

It is quite simple to arrange

For example, I am looking for a duplicate file name in an xml-structure like this

<file>
      <name>some-name</name>
</file>

I make a key like this:

<xsl:key name="dupfile" match="file" use="name"/>

Then I create a template like this

<xsl:template match="file[not(generate-id() = generate-id(key('dupfile', name)[1]))]">
    </xsl:template

It is called the Muenchian Method, find information about this here: http://www.jenitennison.com/xslt/grouping/muenchian.html

edited Jun 24 '21 at 06:19

halfer

19,824
17
99
186

answered May 19 '21 at 08:35

Bert Verhees

1,057
3
14
25

1

The Muenchian (not München) method is required in XSLT 1.0., This question is tagged as `xslt-2.0` where better, built-in methods exist. – michael.hor257k May 19 '21 at 10:38
1

See my comment to the other answer. And also: https://stackoverflow.com/tags/xslt-grouping/info And also: https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-distinct-values – michael.hor257k May 19 '21 at 11:38
@Soumya.r: I have restored this post to its original version. There is nothing in here that could reasonably be considered to be intellectual property. – halfer Jun 24 '21 at 06:21
If you want to remove part of the question then make an edit to that (just to remove the confidential data, not the whole of the code) and flag for mod, as you've been advised. Thanks! – halfer Jun 24 '21 at 06:26

Siebe Jongebloed · Answer 2 · 2021-05-17T17:08:20.900

If (like in your template) it's enough to filter on the value-element, then this wil work.

<xsl:stylesheet 
  version="2.0" 
  xmlns:infor="http://schema.infor.com/InforOAGIS/2" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  exclude-result-prefixes="#all">

  <xsl:output method="xml" encoding="UTF-8" indent="no" byte-order-mark="no"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="infor:Concur_LN_ServiceData">
    <xsl:if test="not(following-sibling::infor:Concur_LN_ServiceData[infor:Value=current()/infor:Value])">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:if>       
  </xsl:template>
</xsl:stylesheet>

There were 2 problems in your xslt:

Namespace: "http://schema.infor.com/InforOAGIS/2" had no prefix: see this example
Your XPath: following::Concur_LN_ServiceData[Concur_LN_ServiceData cannot find anything because there is no Concur_LN_ServiceData with an element Concur_LN_ServiceData

And declare namespaces that you actually use....but that is just my personal preference

EDIT

If you are dealing with large xml, it is better to use for-each-group (like @michael.hor257k is telling):

<xsl:stylesheet 
  version="2.0" 
  xmlns:infor="http://schema.infor.com/InforOAGIS/2" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  exclude-result-prefixes="#all">

  <xsl:output method="xml" encoding="UTF-8" indent="no" byte-order-mark="no"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="infor:DataArea">
    <xsl:copy>
      <xsl:apply-templates select="infor:Show"/>
      <xsl:for-each-group select="infor:Concur_LN_ServiceData" group-by="infor:Value">
        <xsl:sequence select="current-group()[1]"/>
      </xsl:for-each-group>
    </xsl:copy> 
  </xsl:template>
  
</xsl:stylesheet>

This is a very inefficient method. Why on earth would you not use `xsl:for-each group` - esp. after you have already pointed to it as the correct solution? — michael.hor257k, May 17 '21 at 16:43

I want to remove duplicate nodes from a XML structure

2 Answers2