0

I have a large XSD I process using several templates to get a new XSD. In one of the last steps I would like to determine the length of the xml (actually an XSD) that was captured in a variable xsdresult.

Using the string-length function I see a strange length not matching the variable length of xsdresult. Size of string/xsd is over 52000 chars but I see Length: 9862 What am I doing wrong?

   <!-- Catch output in variable -->
   <xsl:variable name="xsdresult">
        <xsl:call-template name="start"/>
   </xsl:variable>

   <xsl:template name="start">
      <xsl:apply-templates/>
   </xsl:template>


   <!-- Build required doc parts -->
   <xsl:variable name="docparts">
        <xsl:call-template name="builddocparts"/>
   </xsl:variable>

   <xsl:template name="builddocparts">
        Length: <xsl:value-of select="string-length(normalize-unicode($xsdresult))"/>
    </xsl:template>
...
Pigna
  • 13
  • 1
  • 4
  • Probably the `string-length` returns the sum of the lengths of all `text()` nodes and excludes the chars of all elements and attributes of your XSD. – zx485 Apr 13 '16 at 14:09
  • Your variable has as its value a result tree fragment (XSLT 1.0) or a temporary document (XSLT 2.0) of nodes but not a serialization of the schema which you seem to expect. – Martin Honnen Apr 13 '16 at 14:25
  • Possible duplicate of [XSLT: How to convert XML Node to String](http://stackoverflow.com/questions/6696382/xslt-how-to-convert-xml-node-to-string) – kjhughes Apr 13 '16 at 14:31
  • This is indeed a pointer in the good direction. Need to find a correct way to translate everything from the XSD to String. Seem to loose Namespaces and comments currenlty. – Pigna Apr 14 '16 at 13:03

1 Answers1

1

A call to string-length() is equivalent to a call to string-length(.), which in turn coerces the current node to a string, so it's equivalent to string-length(string(.)). The value of the string() function is the string value of the node, which for an element node is the string formed by the concatenation of all descendant text nodes.

If you want to know how the minimum amount of space the serialized XML document will take on disk, given a simple serialization, then you must add:

  • For each non-empty element, the length of its start-tag: the length of the element type name, plus 2 for the start-tag delimiters < ... >, plus the sum of the lengths of the attribute-value specifications.
  • For each attribute-value specification, you will need one character for leading whitespace, plus the length of the attribute name, plus the string length of the attribute's value, plus three for the equal sign and quotation marks, plus five characters for each time a quotation mark is replaced by &apos; or &quot;.
  • For each non-empty element, the length of its end-tag (length of its element type name plus 3).
  • For each empty element, the length of its sole tag (length of its element type name, plus length of its attribute-value specifications, plus 3).
  • For each occurrence of < in the data or in attribute values, three characters for the escaping as &lt;.
  • For each occurrence of ampersand in the data or in attribute values, four characters for escaping as &amp;.

Not part of the minimum amount, but possibly part of the space you'll need on disk:

  • The total width of any whitespace added, if you indent the XML structurally.
  • The number of CDATA marked sections you serialize, times 12 (for <![CDATA[ + ]]>).
  • The number of characters saved by using CDATA marked sections instead of &lt; and &amp;.
C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65
  • 1
    Might also need to consider namespace declarations, and namespace prefixes on elements or even attributes. – Flynn1179 Apr 13 '16 at 20:15