I. Non-recursive XSLT 1.0 solution that removes the starting group of any number of consecutive spaces:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p/text()">
<xsl:value-of select=
"substring-after
(.,
substring-before
(.,
substring
(translate(., ' ', ''), 1, 1)
)
)"/>
</xsl:template>
</xsl:stylesheet>
When applied on the provided XML document:
<Text>
<p> Sample Text.</p> <p> Sample Text..</p> <p> Sample Text.</p> <p> Sample Text.</p> <p> Sample Text.</p>
</Text>
the wanted, correct result is produced:
<Text>
<p>Sample Text.</p> <p>Sample Text..</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>
Explanation:
The idea is to:
Obtain the first non-space character.
Obtain the string of spaces preceding this character (obtained in 1.).
Obtain the string that immediately follows that string of spaces (obtained in 2.).
II. XSLT 2.0 solution:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p/text()">
<xsl:sequence select="replace(., '^\s+(.+)$', '$1')"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the same XML document (above), the same correct result is produced:
<Text>
<p>Sample Text.</p> <p>Sample Text..</p> <p>Sample Text.</p> <p>Sample Text.</p> <p>Sample Text.</p>
</Text>
Do note:
Martin Honnen has proposed to use:
replace(., '^\s+', '')
While this is shorter than:
replace(., '^\s+(.+)$', '$1')
the latter is more efficient, because it does a single replacement, while the former performs in general many individual replacements.
Update: The OP wasn't able to use the XSLT 2.0 solution, in a comment he writes:
I am now thinking that what appears to be a space may in fact be a
tab, how would i go about checking this and then removing it?
The solution is just to use:
replace(., '^[\s	 ]+(.+)$', '$1')