Remove space after
tag in text node XSLT

Question

I currently have an XML feed that is being used and within it I have a node that contains a bunch of text with <p> tags. However after each tag there seems to be a space which is causing issues. Example XML document is below:

<Text>
<p> Sample Text.</p> <p> Sample Text..</p> <p> Sample Text.</p> <p> Sample Text.</p> <p> Sample Text.</p>
</Text>

I would like to convert the data in "text" node to be as below by removing the space at the start of each <p> tag.

<Text>
<p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>

Can anyone please help me with this?

Thanks

Related question: http://stackoverflow.com/questions/4409482/how-to-trim-in-xslt — DRCB, Jul 03 '12 at 09:27
On a general note: You might want to accept more ansers. Currently you have asked 16 questions but only accepted 3 answers. This is not very nice. — Tomalak, Jul 03 '12 at 10:24
Thank you Dimitre, I have corrected my acceptance status too apologies to all who have answered my questions I do really appreciate this :-) — user723858, Jul 03 '12 at 13:14

Dimitre Novatchev · Accepted Answer · 2012-07-03T16:22:07.653

I. Non-recursive XSLT 1.0 solution that removes the starting group of any number of consecutive spaces:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="p/text()">
  <xsl:value-of select=
   "substring-after
     (.,
      substring-before
        (.,
         substring
           (translate(., ' ', ''), 1, 1)
         )
      )"/>
 </xsl:template>
</xsl:stylesheet>

When applied on the provided XML document:

<Text>
    <p> Sample Text.</p> <p> Sample Text..</p> <p> Sample Text.</p> <p> Sample Text.</p> <p> Sample Text.</p>
</Text>

the wanted, correct result is produced:

<Text>
    <p>Sample Text.</p> <p>Sample Text..</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>

Explanation:

The idea is to:

Obtain the first non-space character.
Obtain the string of spaces preceding this character (obtained in 1.).
Obtain the string that immediately follows that string of spaces (obtained in 2.).

II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="p/text()">
  <xsl:sequence select="replace(., '^\s+(.+)$', '$1')"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the same XML document (above), the same correct result is produced:

<Text>
    <p>Sample Text.</p> <p>Sample Text..</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>

Do note:

Martin Honnen has proposed to use:

replace(., '^\s+', '')

While this is shorter than:

replace(., '^\s+(.+)$', '$1')

the latter is more efficient, because it does a single replacement, while the former performs in general many individual replacements.

Update: The OP wasn't able to use the XSLT 2.0 solution, in a comment he writes:

I am now thinking that what appears to be a space may in fact be a tab, how would i go about checking this and then removing it?

The solution is just to use:

replace(., '^[\s&#9;&#10;&#13;]+(.+)$', '$1')

I have tried both regular expressiuons with no luck. I was using a variable to store the data as I have even tried the following to make sure that something is being replaced however it is not: after doing this nothing changes. I have only added this replace to the stylesheet, did I need to add as well?? Thanks :-) — user723858, Jul 03 '12 at 13:40
@user723858: I always test my solutions and actually copy/paste the results into the answer. There may be several reasons for not being able to repro the results: 1. You are not running an XSLT 2.0 processor. 2. You have modified the XSLT code of the solution. 3. You are applying the transformation on a different XML document. 4. Both 3. and 4. Check and verify which of the reasons 1 - 4 applies in your case. — Dimitre Novatchev, Jul 03 '12 at 14:31
I have tried checking all the elements and it should work, I am now thinking that what appears to be a space may in fact be a tab, how would i go about checking this and then removing it? Thanks for the help so far :-) — user723858, Jul 03 '12 at 15:29
@user723858: Then use: `replace(., '^[\s ]+(.+)$', '$1')` Read the update to the answer. — Dimitre Novatchev, Jul 03 '12 at 16:12
I would first like to say thanks for the continued support :-) I have tried the following: however this didn't work. I then tried the following: This did work however its removed the first space and then all spaces, I don't understand why this isn't working?? — user723858, Jul 03 '12 at 19:31
@user723858: Please, ask anew question and give complete code examples (buat as small as possible) of what you have tried. The comments format is inconvenient for sharing code. — Dimitre Novatchev, Jul 03 '12 at 19:43

score 1 · Answer 2 · answered Jul 03 '12 at 09:30

Use the identity transformation template

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

plus a template for the first child of a p element

<xsl:template match="p/text()[1]">
  <xsl:value-of select="substring(., 2)"/>
</xsl:template>

score 0 · Answer 3 · answered Jul 03 '12 at 09:27

0

Try the following XSLT

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes" />
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:value-of select="normalize-space(.)" />
    </xsl:template>

</xsl:stylesheet>

answered Jul 03 '12 at 09:27

Ravish

2,428
2
18
24

This butchers any whitespace in any text node. Could be a little too much. – Tomalak Jul 03 '12 at 09:42
I feel it just removes the preceding and trailing spaces and seems like the requirement makes sense to remove such spaces. – Ravish Jul 03 '12 at 10:37
It also removes the spaces between the p tags, and - as I said - the spaces in any other text node of the input. It might be what the OP needs, but it's not what he asked for, that's all I'm saying. – Tomalak Jul 03 '12 at 10:55

score 0 · Answer 4 · answered Jul 03 '12 at 09:34

This template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Text/p/text()[1]">
    <xsl:call-template name="ltrim" />
  </xsl:template>

  <xsl:template name="ltrim">
    <xsl:param name="start" select="1" />

    <xsl:choose>
      <xsl:when test="substring(., $start, 1) = ' '">
        <xsl:call-template name="ltrim">
          <xsl:with-param name="start" select="$start + 1" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="substring(., $start)" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

left-trims any whitespace at the start of a <p> tag's contents only.

It leaves all other whitespace alone. For your XML it returns:

<Text>
<p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>

The question is tagged as xslt-2.0 so instead of a recursive template it should suffice to use ``. — Martin Honnen, Jul 03 '12 at 10:22
@Martin: Absolutely, you're right. I missed the 2.0 tag. The regex is much simpler. — Tomalak, Jul 03 '12 at 10:57
Tomalak: You might be interested to see a *non-recursive* XSLT 1.0 solution. — Dimitre Novatchev, Jul 03 '12 at 12:47
@Dimitre Right! Why didn't I think of that?! +1 to your approach. — Tomalak, Jul 03 '12 at 12:57

score -1 · Answer 5 · answered Jul 03 '12 at 09:19

-1

Use Simple Search-Replace utility: http://www.rjlsoftware.com/software/utility/search/

answered Jul 03 '12 at 09:19

Spacedust

931
1
11
22

Remove space after tag in text node XSLT

5 Answers5

Remove space after
tag in text node XSLT