4

I have an XML file similar to this (with more nodes and details removed):

<?xml version="1.0" encoding="utf-8"?>
<Message xmlns="http://www.theia.org.uk/ILR/2011-12/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header>
    <CollectionDetails>
        <Collection>ILR</Collection>
        <Year>1112</Year>
        <FilePreparationDate>2011-10-06</FilePreparationDate>
    </CollectionDetails>
    <Source>
        <ProtectiveMarking>PROTECT-PRIVATE</ProtectiveMarking>          
    </Source>
</Header>
<SourceFiles>
    <SourceFile>
        <SourceFileName>A10004705001112004401.ER</SourceFileName>
        <FilePreparationDate>2011-10-05</FilePreparationDate>
    </SourceFile>
</SourceFiles>
<LearningProvider>
    <UKPRN>10004705</UKPRN>
    <UPIN>107949</UPIN>
</LearningProvider>
<Learner>
    <ULN>4682272097</ULN>
    <GivenNames>Peter</GivenNames>
    <LearningDelivery>
        <LearnAimRef>60000776</LearnAimRef>         
    </LearningDelivery>     
    <LearningDelivery>
        <LearnAimRef>ZPROG001</LearnAimRef>         
    </LearningDelivery>
</Learner>
<Learner>
    <ULN>3072094321</ULN>       
    <GivenNames>Thomas</GivenNames>     
    <LearningDelivery>
        <LearnAimRef>10055320</LearnAimRef>         
    </LearningDelivery>
    <LearningDelivery>
        <LearnAimRef>10002856</LearnAimRef>         
    </LearningDelivery>
    <LearningDelivery>
        <LearnAimRef>1000287X</LearnAimRef>         
    </LearningDelivery>
</Learner>
</Message>

I need to filter this so that only Learner records that have a child LearningDelivery LearnAimRef of ZPROG001 will show so the output in this case would be the first learner but not the second:

<?xml version="1.0" encoding="utf-8"?>
<Message xmlns="http://www.theia.org.uk/ILR/2011-12/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header>
    <CollectionDetails>
        <Collection>ILR</Collection>
        <Year>1112</Year>
        <FilePreparationDate>2011-10-06</FilePreparationDate>
    </CollectionDetails>
    <Source>
        <ProtectiveMarking>PROTECT-PRIVATE</ProtectiveMarking>          
    </Source>
</Header>
<SourceFiles>
    <SourceFile>
        <SourceFileName>A10004705001112004401.ER</SourceFileName>
        <FilePreparationDate>2011-10-05</FilePreparationDate>
    </SourceFile>
</SourceFiles>
<LearningProvider>
    <UKPRN>10004705</UKPRN>
    <UPIN>107949</UPIN>
</LearningProvider>
<Learner>
    <ULN>4682272097</ULN>
    <GivenNames>Peter</GivenNames>
    <LearningDelivery>
        <LearnAimRef>60000776</LearnAimRef>         
    </LearningDelivery>     
    <LearningDelivery>
        <LearnAimRef>ZPROG001</LearnAimRef>         
    </LearningDelivery>
</Learner>
</Message>

I have looked into how to do this and believe the correct way to do this is to use an XSL transform to process the xml and output as needed to a new file (Doing this in c#). After a couple of hours trying to wrap my head around the XSLT syntax I am still stuck and can't get the output I want. Any help much appreciated.

PeteT
  • 18,754
  • 26
  • 95
  • 132

2 Answers2

5

To copy most of an XML source document, modifying only certain parts, you'll want to start with an identity transform. This just copies everything. Then add a template to override the identity template for <Learner> elements you don't want to copy:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:theia="http://www.theia.org.uk/ILR/2011-12/1">
  <!-- identity template -->
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <!-- override the above template for certain Learner elements; output nothing. -->
  <xsl:template match="theia:Learner[
     not(theia:LearningDelivery/theia:LearnAimRef = 'ZPROG001')]">
  </xsl:template>
</xsl:stylesheet>

(borrowing namespace prefix from @andyb).

LarsH
  • 27,481
  • 8
  • 94
  • 152
  • Excellent answer this did exactly what I wanted. I wish I understood XSLT a bit more but its just a one off for me. – PeteT Oct 07 '11 at 09:00
1

If you just want all the <Learner> elements that have a descendent (in this case LearnAimRef) with a particular value then you can use a predicate expression (the bit between the [ and ]) to filter the node-set.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:theia="http://www.theia.org.uk/ILR/2011-12/1">
<xsl:template match="/theia:Message">
  <xsl:copy-of select="theia:Learner[theia:LearningDelivery/theia:LearnAimRef='ZPROG001']"/>
</xsl:template>
</xsl:stylesheet>

So the copy-of reads as copy all the Learner nodes, that have a child called LearningDelivery which has a child called LearnAimRef that has a value equal to ZPROG001

Your XML document has a default namespace of "http://www.theia.org.uk/ILR/2011-12/1" so in order for the XPath to correctly select a node, it has to use the same namespace declaration, so in the above XSLT, I have assigned your namespace to an alias and used that in the XPath.

If you want other parts of the XML source copying to the output tree, you could add further rules for example <xsl:copy-of select="theia:LearningProvider"/>

This is not an answer for applying the transformation in C#, however that has been answered already - How to apply an XSLT Stylesheet in C#

Hope this helps :)

Community
  • 1
  • 1
andyb
  • 43,435
  • 12
  • 121
  • 150
  • Good job on showing how to select the desired Learner elements, especially with the namespace issue. However the OP did show in his desired output that he wants most of the document copied; only `` elements without the right content are to be omitted. – LarsH Oct 07 '11 at 04:55
  • Yeah, I did mention about copying more nodes in the answer and that the XSLT was not complete. +1 for your much cleaner and more complete answer. – andyb Oct 07 '11 at 06:16
  • Thanks for the answer, I had worked out the C# side of things it was the actual XSLT file I was having trouble with. – PeteT Oct 07 '11 at 09:01