4

I'm trying to find the best (efficient) way of doing this.

I have a medium sized XML document. Depending on specific settings certain portions of it need to be filtered out for security reasons.

I'll be doing this in XSLT as it's configurable and no code should need changing.

I've looked around, but not getting much luck on it.

For example:

I have the following XPath:

//*[@root='2.16.840.1.113883.3.51.1.1.6.1']

Whicrooth gives me all nodes with a root attribute equal to a specific OID. In these nodes I want to have all attributes except for a few (ex. foo and bar) erased, and then having another attribute added (ex. reason)

I also need to have multiple XPath expressions that can be ran to zero down on a specific node and clear it's contents out in a similar fashion, with respect to nodes with specific attributes.

I'm playing around with information from:

XPath expression to select all XML child nodes except a specific list?

and Remove Elements and/or Attributes by Name per XSL Parameters

Will update shortly when I can have access what what I"ve done so far.

Example:

XML Before Transformation. UPdate: I want to filter out Extension, and then all values in the document that match the value of that extension attribute:

<root>
    <childNode>
        <innerChild root="2.16.840.1.113883.3.51.1.1.6.1" extension="123" type="innerChildness"/>
        <innerChildSibling/>
    </childNode>
    <animals>
     <cat>
       <name>123</name>
     </cat>
    </animals>
    <tree/>
    <water root="2.16.840.1.113883.3.51.1.1.6.1" extension="1223" type="liquidLIke"/>
</root>

After

<root>
    <childNode>
        <innerChild root="2.16.840.1.113883.3.51.1.1.6.1" flavor="MSK"/> <!-- filtered -->
        <innerChildSibling/>
    </childNode>
    <animals>
      <cat>
        <name>****</name>
       </cat> <!-- cat was filtered -->
    </animals>
    <tree/>
    <water root="2.16.840.1.113883.3.51.1.1.6.1" flavor="MSK"/> <!-- filtered -->
</root>

I am able to use XSLT2.

I'm trying this without any luck (For starters)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:param name="OIDAttrToDelete" select="'extension'"/>

    <xsl:template match="node()|@*" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- Get all nodes for the OID -->
    <xsl:template match="//*[@root='2.16.840.1.113883.3.51.1.1.6.1']">
        <xsl:if test="name() = $OIDAttrToDelete">
            <xsl:attribute name="nullFlavor">MSK</xsl:attribute>
            <xsl:call-template name="identity"/>            
        </xsl:if>
    </xsl:template>    
</xsl:stylesheet>
Community
  • 1
  • 1
Ryan Ternier
  • 8,714
  • 4
  • 46
  • 69
  • Ultimately what I'm hoping for is the ability to add (once I learn more... i'm no expert to XSLT) additional filters to filter out the XML. It's easier to maintain than updating code each time. SO each time could have different rules. – Ryan Ternier Jun 18 '12 at 22:00
  • Ryan Ternier: It isn't too-challenging to implement all these requirements in an XSLT 2.0 transformation. – Dimitre Novatchev Jun 19 '12 at 03:37

2 Answers2

2
<xsl:param name="OIDAttrToDelete" select="'extension'" />

<xsl:template match="* | node()">
  <xsl:copy>
    <xsl:apply-templates select="* | node()" />
  </xsl:copy>
</xsl:template>

<xsl:template match="@*">
  <xsl:choose>
    <xsl:when test="../@root = '2.16.840.1.113883.3.51.1.1.6.1'">
      <xsl:copy-of select=".[not(contains($OIDAttrToDelete, name()))]" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:copy-of select=".">
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Notes:

I created a template that matches attributes only and decides whether to copy them or not. This way I don't have to interfere with the identity template very much.

There is no need to give a name to the identity template. Just call <apply-templates> with an appropriate select expression and the processor will call it automatically.

Match expressions in templates are not full XPath expressions. You do not need to match //*[predicate]. Using *[predicate] is enough.

If security concerns are your reason, I would concider a white-list ($OIDAttrToKeep) instead.

If $OIDAttrToDelete is a list of values (for example comma-separated), you should include the separator in the test:

.[
  not(
    contains(
      concat(',', $OIDAttrToDelete, ','), 
      concat(',', name(), ',') 
    )
  )
]

to avoid partial name matches.

If your parent OID should be configurable, you can use the same technique:

<xsl:template match="@*">
  <xsl:choose>
    <xsl:when test="
      contains(
        concat(',', $OIDToStrip, ','),
        concat(',', ../@root, ',')
      )
    ">
    <!-- ... -->
    </xsl:when>
  </xsl:choose>
</xsl:template>
Tomalak
  • 332,285
  • 67
  • 532
  • 628
2

Here is a complete XSLT 2.0 transformation that, according to an external parameter, identifies elements having a specific attribute name and value and for each such elements deletes all attributes that aren't white-listed and adds other specified attributes:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="vFilters">
     <filter>
      <markerAttribute name="root">2.16.840.1.113883.3.51.1.1.6.1</markerAttribute>
      <whiteListedAttributes>
        <name>root</name>
        <name>foo</name>
      </whiteListedAttributes>
      <addAtributes flavor="MSK" reason="Demo"/>
     </filter>
 </xsl:param>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[for $cur in .,
        $m in $vFilters/filter/markerAttribute
     return
        $cur/@*[name() eq $m/@name and . eq $m]
   ]">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:copy-of select=
     "for $m
           in $vFilters/filter/markerAttribute
       return
         if(current()/@*
                      [name() eq $m/@name
                     and
                      . eq $m
                      ])
           then
             $m/../addAtributes/@*
           else ()
     "/>
    <xsl:apply-templates/>
  </xsl:copy>
 </xsl:template>

  <xsl:template match=
 "@*[for $cur in .,
         $p in ..,
         $m in $vFilters/filter/markerAttribute
     return
          $p/@*[name() eq $m/@name and . eq $m]
         and
          not(name($cur) = $m/../whiteListedAttributes/name)
    ]
  "/>
</xsl:stylesheet>

When this transformation is applied on the following XML document (based on the provided, but added one white-listed attribute):

<root>
    <childNode>
        <innerChild root="2.16.840.1.113883.3.51.1.1.6.1"
          a="b" b="c" foo="bar" type="innerChildness"/>
        <innerChildSibling/>
    </childNode>
    <animals>
        <cat>
            <name>bob</name>
        </cat>
    </animals>
    <tree/>
    <water root="2.16.840.1.113883.3.51.1.1.6.1"
    z="zed" l="ell" type="liquidLIke"/>
</root>

The wanted, correct result is produced -- on the identified elements all non-white-listed attributes are deleted and the two new attributes specified in the filter are added:

<root>
      <childNode>
            <innerChild root="2.16.840.1.113883.3.51.1.1.6.1" foo="bar" flavor="MSK" reason="Demo"/>
            <innerChildSibling/>
      </childNode>
      <animals>
            <cat>
                  <name>bob</name>
            </cat>
      </animals>
      <tree/>
      <water root="2.16.840.1.113883.3.51.1.1.6.1" flavor="MSK" reason="Demo"/>
</root>

Explanation:

The external parameter $vFilters can contain one or more filters as the following:

 <filter>
  <markerAttribute name="root">2.16.840.1.113883.3.51.1.1.6.1</markerAttribute>
  <whiteListedAttributes>
    <name>root</name>
    <name>foo</name>
  </whiteListedAttributes>
  <addAtributes flavor="MSK" reason="Demo"/>
 </filter>

The markerAttribute element specifies the name and value of the identifying attribute. In this case, the filter identifies (is for) elements that have a root attribute whose value is "2.16.840.1.113883.3.51.1.1.6.1".

There are two whitelisted attribute names specified in this filter: root and foo.

Two new attributes with the specified values are to be added on every identified by this filter element: flavor="MSK" and reason="Demo".

The external parameter $vFilters can contain many filters, each identifying a different "type" of element and specifying a different set of white-listed attribute names and new attributes to be added.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • And I had no idea XSLT could do this! thanks. This takes 4ms to parse the document. – Ryan Ternier Jun 19 '12 at 16:57
  • @RyanTernier: You are welcome. Do you have any remaining problems in using this solution? – Dimitre Novatchev Jun 19 '12 at 17:35
  • They both work great - thanks man. What scripting language is used in the XSLT? Is that a generic script language for xslt? – Ryan Ternier Jun 21 '12 at 16:21
  • @RyanTernier: This is standard XSLT 2.0 -- no extensions whatsoever are used. You are probably wondering about the match patterns -- these are absolute valid. What probably seems strange to you is XPath 2.0 -- this is something that is the foundation of XSLT 2.0 and every compliant XSLT 2.0 processor implements XPath 2.0. – Dimitre Novatchev Jun 21 '12 at 16:26