41

Can someone explain the difference between text() and string() functions. I often use one with other, but it does not make any difference, both will get the string value of the xml node.

Nakilon
  • 34,866
  • 14
  • 107
  • 142
Jayy
  • 2,368
  • 4
  • 24
  • 35

2 Answers2

68

Can someone explain the difference between text() and string() functions.

I. text() isn't a function but a node test.

It is used to select all text-node children of the context node.

So, if the context node is an element named x, then text() selects all text-node children of x.

Other examples:

/a/b/c/text()

selects all text-node children of any c element that is a child of any b element that is a child of the top element a.

II. The string() function

By definition string(exprSelectingASingleNode) returns the string value of the node.

The string value of an element is the concatenation of all of its text-node descendents -- in document order.

Therefore, if in the following XML document:

<a>
  <b>2</b>
  <c>3
    <d>4</d>
  </c>
  5
</a>

string(/a) returns (without the surrounding quotes):

"
  2
  3
    4

  5
"

As we see, the string value reflects three white-space-only text-nodes, which we typically fail to notice and account for.

Some XML parsers have the option of stripping-off white-space-only text nodes. If the above document was parsed with the white-space-only text nodes stripped off, then the same function:

string(/a)

now returns:

"23
    4
  5
"
mklement0
  • 382,024
  • 64
  • 607
  • 775
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
11

Most of the time, if you want the content of an element node X, you can refer to it as ".", if it's the context node, or as "X" if it's a child of the context node. For example:

<xsl:if test="X = 'abcd'">...

or

<xsl:value-of select="."/>

In both cases, because the context demands a string, the string() function is applied automatically. (That's a slight simplification, if you're running schema-aware XSLT 2.0 the rules are a little more complicated).

Using "string()" here is unnecessary, because it's done automatically; and using text() is a mistake (one that seems to be increasingly common, encouraged by some bad tutorials on the web). Using ./text() orX/text() in this situation gives you all the text node children of the element. Often the element has one text node child whose string value happens to be the same as the string value of the element, but your code fails if someone adds a comment or processing instruction, because the value is then split into multiple text nodes. It also fails if the element is one (say "title") that allows mixed content: string(title) and title/text() are going to give the same answer until you hit an article with the title

<title>On the wetness of H<sub>2</sub>O</title>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • 1
    I used to believe that using an expression like select="/somenode/text()" would indeed be more precise and less error prone. So you are suggesting the /text() part is unnecessary, even a mistake? Could you please discuss briefly when it is a good idea or useful to use /text() then? Thanks – user8658912 Oct 02 '17 at 22:39
  • Comments are not designed for asking supplementary questions, please raise a new question. – Michael Kay Oct 02 '17 at 22:43
  • (in other words) Your statement that _"string() is applied automatically"_ is true in several XSLT elements, like xsl:value-of. However, note that the user doesn't mention XSLT at all in his question. So, as you explained, using text() or string() is usually wrong/unnecessary **in XSLT**, but it does make sense to use string() for instance in XQuery when expecting the string value of a node, otherwise we would get the full node instead, right? Whereas text() would rather be appropiate to specifically browse through text nodes. – user8658912 Oct 13 '17 at 08:16
  • 1
    Yes, there are some contexts (XQuery `{x/y/z}` is the most notorious example; another is `instance of`) where there is no implicit atomization and it should therefore be done manually: using `string()` or `data()` is nearly always better than using `/text()`. – Michael Kay Oct 13 '17 at 08:22