6

I want to transform processing instructions in a source xml to some tag in an output

Input

<?xml version="1.0" encoding="utf-8"?>
<root>
    <?PI_start?> SOME TEXT <?PI_end?>
</root>

I want to have the output xml like that

<root>
    <tag> SOME TEXT </tag>
</root>

Can I do it? If yes what xsl must I use for transform?

I found only a way to transform PIs to the opening and closing tags. PI can contain some content.

Input XML

<root>
    <?PI SOME TEXT?>
</root>

XSL

<xsl:template match="processing-instruction('PI')">
    <tag><xsl:value-of select="."/></tag>
</xsl:template>

Output

<tag>SOME TEXT</tag>

But this is a bit not my case

Nawa
  • 2,058
  • 8
  • 26
  • 48

2 Answers2

10

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="processing-instruction('PI_start')">
  <tag>
   <xsl:apply-templates mode="copy" select=
       "following-sibling::node()[1][self::text()]"/>
  </tag>
 </xsl:template>

 <xsl:template match=
 "processing-instruction('PI_end')
 |
  text()[preceding-sibling::node()[1]
              [self::processing-instruction('PI_start')]]
 "/>
</xsl:stylesheet>

when applied on the provided XML document:

<?xml version="1.0" encoding="utf-8"?>
<root>
    <?PI_start?> SOME TEXT <?PI_end?>
</root>

produces the wanted, correct result:

<root>
   <tag> SOME TEXT </tag>
</root>

Do note:

  1. The identity rule is used to copy all nodes "as-is".

  2. We have additional templates only for nodes that should be changed in some way.

  3. The template matching the first PI "does almost all the work". It creates a tag element and applies templates to the following-sibling node if it is a PI.

  4. We apply templates in mode "copy" for the text node immediate sibling of the first PI.

  5. The mode "copy" isn't declared anywhere and this causes the default template for processing text nodes to be selected -- its action is to just copy the text node. This is a trick that saves us from the need to define a template in the "copy" mode.

  6. We have an empty template that actually deletes the unwanted nodes: the second PI and what would be a second copy of the first PI's immediate-sibling text node.

Update: The OP has indicated that he is also interested in the case where in-between the two PIs there might be different nodes (not only text nodes).

This is a lot more complex task and here is one solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kSurrounded" match="node()"
  use="concat(
        generate-id(preceding-sibling::processing-instruction('PI_start')[1]),
        '+++',
        generate-id(following-sibling::processing-instruction('PI_end')[1])
             )"/>

 <xsl:template match="node()|@*" name="identity">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="processing-instruction('PI_start')">
  <tag>
   <xsl:apply-templates mode="copy" select=
       "key('kSurrounded',
             concat(generate-id(),
                   '+++',
                   generate-id(following-sibling::processing-instruction('PI_end')[1])
                   )
             )"/>
  </tag>
 </xsl:template>

 <xsl:template match=
 "processing-instruction('PI_end')
 |
  node()[(preceding-sibling::processing-instruction('PI_start')
         |
          preceding-sibling::processing-instruction('PI_end')
          )
           [last()][self::processing-instruction('PI_start')]
        and
         (following-sibling::processing-instruction('PI_start')
        |
          following-sibling::processing-instruction('PI_end')
          )
           [1][self::processing-instruction('PI_end')]
        ]
 "/>

 <xsl:template match="node()" mode="copy">
  <xsl:call-template name="identity"/>
 </xsl:template>
</xsl:stylesheet>

when the above transformation is applied on the following XML document:

<root>
    <?PI_start?> <strong>Some</strong> TEXT <?PI_end?> XA <?PI_end?>
</root>

the wanted, correct output is produced:

<root>
    <tag>
        <strong>Some</strong> TEXT 
    </tag> XA 
</root>
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 1
    @Dimitre: Isn't the "strip text nodes" rule very general? Wouldn't it be better a groping between tags approach? –  Nov 12 '10 at 14:40
  • @Alejandro: No, why do you think that this directive should not be used? – Dimitre Novatchev Nov 12 '10 at 14:43
  • @Dimitre: Maybe `text()[preceding-sibling::node()[1][self::processing-instruction('PI_start')]]` would be better. Don't you think? –  Nov 12 '10 at 14:53
  • 1
    @Alej, yes, I was going to say the same thing. In other words, @Dimitre's code would lose any text that immediately follows ``. If the input can have such text. The example didn't, but we don't know that about the real input. – LarsH Nov 12 '10 at 15:19
  • @Alejandro, @LarsH: C'mon guys, I don't understand at all what you are saying. Which template? – Dimitre Novatchev Nov 12 '10 at 15:32
  • @LarsH: OK, I see what you're saying: Yes, I fixed this. – Dimitre Novatchev Nov 12 '10 at 15:36
  • Great answer! In my concrete use case I have xml input what transforms to html. Output html - it is visual presentation of xml data. In input I have some metadata about errors. I want to show these errors in html. For example I have and this link is incorrect. In html this will look . But error scope bounded by Process Instructions may have several tags – Nawa Nov 12 '10 at 15:51
  • @Dimitre: Now is excellent. +1 Also for ussing built-in rules propagation in every mode. –  Nov 12 '10 at 15:52
  • @Dimitre: @Nawa just wrote *But error scope bounded by Process Instructions may have several tags*. This leads to the grouping between marks approuch... –  Nov 12 '10 at 15:54
  • @Alejandro: What does @Nawa want to say? I don't understand it. – Dimitre Novatchev Nov 12 '10 at 17:05
  • @Nawa, @Alejandro, @Lars: I have edited my answer and added the solution for the more complex problem. – Dimitre Novatchev Nov 12 '10 at 17:28
  • Proposed solution has one drawback - it leads to double processing of tags between processing instructions. Can you suggest how to avoid applying templates for tags between processing instructions after finishing template - ? – Nawa Nov 18 '10 at 12:53
  • @Nawa: There is no "double processing". There is no processing in the template with the empty body. There are nodes that are selected by more than one template -- true. However the second template just does nothing -- ignores them. This has really intractible additional performance cost. On the other side, not limiting to fine-grain, one node at a time processing, allows the (theoretical) possibility of templates being executed in parallel on multi-core architecture. – Dimitre Novatchev Nov 18 '10 at 13:37
  • @DimitreNovatchev - Beautiful Explanation Sir, Have added your answer to my favorite list. – BatScream May 27 '15 at 01:42
1

I propose another way to solve my problem. I hope that is correct

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" />
<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
</xsl:template>
<xsl:template match="processing-instruction('PI_start')">
    <xsl:text disable-output-escaping="yes"><![CDATA[<tag>]]></xsl:text>
</xsl:template>

<xsl:template match="processing-instruction('PI_end')">
    <xsl:text disable-output-escaping="yes"><![CDATA[</tag>]]></xsl:text>
</xsl:template>

Input

<root>
    <?PI_start?> <strong>Some</strong> TEXT <?PI_end?> XA <?PI_end?>
</root>

Correct output is produced

<root>
    <tag> <strong>Some</strong> TEXT </tag> XA </tag>
</root>
Nawa
  • 2,058
  • 8
  • 26
  • 48