1

I currently have an XML feed that is being used and within it I have a node that contains a bunch of text with <p> tags. However after each tag there seems to be a space which is causing issues. Example XML document is below:

<Text>
<p> Sample Text.</p> <p> Sample Text..</p> <p> Sample Text.</p> <p> Sample Text.</p> <p> Sample Text.</p>
</Text>

I would like to convert the data in "text" node to be as below by removing the space at the start of each <p> tag.

<Text>
<p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>

Can anyone please help me with this?

Thanks

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
user723858
  • 1,017
  • 3
  • 23
  • 45
  • Related question: http://stackoverflow.com/questions/4409482/how-to-trim-in-xslt – DRCB Jul 03 '12 at 09:27
  • 2
    On a general note: You might want to accept more ansers. Currently you have asked 16 questions but only accepted 3 answers. This is not very nice. – Tomalak Jul 03 '12 at 10:24
  • 1
    Thank you Dimitre, I have corrected my acceptance status too apologies to all who have answered my questions I do really appreciate this :-) – user723858 Jul 03 '12 at 13:14

5 Answers5

2

I. Non-recursive XSLT 1.0 solution that removes the starting group of any number of consecutive spaces:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="p/text()">
  <xsl:value-of select=
   "substring-after
     (.,
      substring-before
        (.,
         substring
           (translate(., ' ', ''), 1, 1)
         )
      )"/>
 </xsl:template>
</xsl:stylesheet>

When applied on the provided XML document:

<Text>
    <p> Sample Text.</p> <p> Sample Text..</p> <p> Sample Text.</p> <p> Sample Text.</p> <p> Sample Text.</p>
</Text>

the wanted, correct result is produced:

<Text>
    <p>Sample Text.</p> <p>Sample Text..</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>

Explanation:

The idea is to:

  1. Obtain the first non-space character.

  2. Obtain the string of spaces preceding this character (obtained in 1.).

  3. Obtain the string that immediately follows that string of spaces (obtained in 2.).


II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="p/text()">
  <xsl:sequence select="replace(., '^\s+(.+)$', '$1')"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the same XML document (above), the same correct result is produced:

<Text>
    <p>Sample Text.</p> <p>Sample Text..</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>

Do note:

Martin Honnen has proposed to use:

replace(., '^\s+', '')

While this is shorter than:

replace(., '^\s+(.+)$', '$1')

the latter is more efficient, because it does a single replacement, while the former performs in general many individual replacements.

Update: The OP wasn't able to use the XSLT 2.0 solution, in a comment he writes:

I am now thinking that what appears to be a space may in fact be a tab, how would i go about checking this and then removing it?

The solution is just to use:

replace(., '^[\s&#9;&#10;&#13;]+(.+)$', '$1')
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • I have tried both regular expressiuons with no luck. I was using a variable to store the data as I have even tried the following to make sure that something is being replaced however it is not: after doing this nothing changes. I have only added this replace to the stylesheet, did I need to add as well?? Thanks :-) – user723858 Jul 03 '12 at 13:40
  • @user723858: I always test my solutions and actually copy/paste the results into the answer. There may be several reasons for not being able to repro the results: 1. You are not running an XSLT 2.0 processor. 2. You have modified the XSLT code of the solution. 3. You are applying the transformation on a different XML document. 4. Both 3. and 4. Check and verify which of the reasons 1 - 4 applies in your case. – Dimitre Novatchev Jul 03 '12 at 14:31
  • I have tried checking all the elements and it should work, I am now thinking that what appears to be a space may in fact be a tab, how would i go about checking this and then removing it? Thanks for the help so far :-) – user723858 Jul 03 '12 at 15:29
  • @user723858: Then use: `replace(., '^[\s ]+(.+)$', '$1')` Read the update to the answer. – Dimitre Novatchev Jul 03 '12 at 16:12
  • I would first like to say thanks for the continued support :-) I have tried the following: however this didn't work. I then tried the following: This did work however its removed the first space and then all spaces, I don't understand why this isn't working?? – user723858 Jul 03 '12 at 19:31
  • @user723858: Please, ask anew question and give complete code examples (buat as small as possible) of what you have tried. The comments format is inconvenient for sharing code. – Dimitre Novatchev Jul 03 '12 at 19:43
1

Use the identity transformation template

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

plus a template for the first child of a p element

<xsl:template match="p/text()[1]">
  <xsl:value-of select="substring(., 2)"/>
</xsl:template>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
0

Try the following XSLT

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes" />
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:value-of select="normalize-space(.)" />
    </xsl:template>

</xsl:stylesheet>
Ravish
  • 2,428
  • 2
  • 18
  • 24
  • This butchers any whitespace in any text node. Could be a little too much. – Tomalak Jul 03 '12 at 09:42
  • I feel it just removes the preceding and trailing spaces and seems like the requirement makes sense to remove such spaces. – Ravish Jul 03 '12 at 10:37
  • It also removes the spaces between the p tags, and - as I said - the spaces in any other text node of the input. It might be what the OP needs, but it's not what he asked for, that's all I'm saying. – Tomalak Jul 03 '12 at 10:55
0

This template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Text/p/text()[1]">
    <xsl:call-template name="ltrim" />
  </xsl:template>

  <xsl:template name="ltrim">
    <xsl:param name="start" select="1" />

    <xsl:choose>
      <xsl:when test="substring(., $start, 1) = ' '">
        <xsl:call-template name="ltrim">
          <xsl:with-param name="start" select="$start + 1" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="substring(., $start)" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

left-trims any whitespace at the start of a <p> tag's contents only.

It leaves all other whitespace alone. For your XML it returns:

<Text>
<p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>
Tomalak
  • 332,285
  • 67
  • 532
  • 628
-1

Use Simple Search-Replace utility: http://www.rjlsoftware.com/software/utility/search/

Spacedust
  • 931
  • 1
  • 11
  • 22