11

Disclaimer: the following is a sin against XML. That's why I'm trying to change it with XSLT :)

My XML currently looks like this:

<root>
    <object name="blarg" property1="shablarg" property2="werg".../>
    <object name="yetanotherobject" .../>
</root>

Yes, I'm putting all the textual data in attributes. I'm hoping XSLT can save me; I want to move toward something like this:

<root>
    <object>
        <name>blarg</name>
        <property1>shablarg</name>
        ...
    </object>
    <object>
        ...
    </object>
</root>

I've actually got all of this working so far, with the exception that my sins against XML have been more... exceptional. Some of the tags look like this:

<object description = "This is the first line

This is the third line.  That second line full of whitespace is meaningful"/>

I'm using xsltproc under linux, but it doesn't seem to have any options to preserve whitespace. I've attempted to use xsl:preserve-space and xml:space="preserve" to no avail. Every option I've found seems to apply to keeping whitespace within the elements themselves, but not the attributes. Every single time, the above gets changed to:

This is the first line This is the third line.  That second line full of whitespace is meaningful

So the question is, can I preserve the attribute whitespace?

Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
Atiaxi
  • 1,637
  • 1
  • 13
  • 18
  • You should replace your white-spaces with entity references for inside the attribe value, like replace `` with ``. The attribute value normalization (3.3.3) then depends on the attribute type which I think is `CDATA` by defatult. However I think you can force it with ` '>` - may or may not be correct. Then if you have an XSL you need to make sure to handle your white-space manually, I done similarly to `string-join()` and `tokenize()`. – n611x007 Apr 21 '15 at 18:11
  • ***It can be done.*** You can get a full example ([SSCCE](http://www.sscce.org/ "Short, Self Contained, Correct (Compilable), Example")) out of my answer to an other question: http://stackoverflow.com/a/29780972/611007 (As I explained above, it's not the way you try to do it but in the end, it will work like you would want.) – n611x007 Apr 21 '15 at 20:05
  • related: https://stackoverflow.com/questions/449627/ - related: https://stackoverflow.com/questions/2004386/ - related: https://stackoverflow.com/questions/1289524/ – n611x007 Apr 22 '15 at 10:58

4 Answers4

6

This is actually a raw XML parsing problem, not something XSLT can help you with. An XML parse must convert the newlines in that attribute value to spaces, as per ‘3.3.3 Attribute-Value Normalization’ in the XML standard. So anything currently reading your description attributes and keeping the newlines in is doing it wrong.

You may be able to recover the newlines by pre-processing the XML to escape the newlines to & #10; character references, as long as you haven't also got newlines where charrefs are disallowed, such as inside tag bodies. Charrefs should survive as control characters through to the attribute value, where you can then turn them into text nodes.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • 1
    I'm not sure this will work. Charrefs get replaced by the bytes they represent by the XML processor, and so a charref referring to a whitespace character (like LINE FEED) will be normalized as whitespace. – ChuckB Nov 04 '08 at 17:03
  • 1
    The standard and DOM Test Suite say it works; Your Implementation May Vary, but the ones I've tested do. – bobince Jan 08 '09 at 02:21
  • @ChuckB I think it depends *whether you can control your xml processor*. I can create a good output with an `.xsl` which works both in saxon and firefox. – n611x007 Apr 21 '15 at 20:11
  • That same [section of the XML specification](https://www.w3.org/TR/REC-xml/#AVNormalize) specifically notes that character references such as ` ` **do work**: "if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself". Of course, in practice, it'll only work in compliant implementations... – MvanGeest Jun 25 '16 at 23:10
3

According to the Annotated XML Spec, white space in attribute values are normalized by the XML processor (See the (T) annotation on 3.3.3). So, it looks like the answer is probably no.

James Sulak
  • 31,389
  • 11
  • 53
  • 57
1

As others have pointed out, the XML spec doesn't allow for the preservation of spaces in attributes. In fact, this is one of the few differentiators between what you can do with attributes and elements (the other main one being that elements can contain other tags while attributes cannot).

You will have to process the file outside of XML first in order to preserve the spaces.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • I think this is misleading. If you can control your xml processor, in itself it seems valid and possible to preserve that white-space. I could achieve the result. – n611x007 Apr 21 '15 at 20:13
0

If you can control your XML processor, you can do it.

From my other answer (which has many references linked):

if you have an XML like

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE elemke [
<!ATTLIST brush wood CDATA #REQUIRED>
]>

<elemke>
<brush wood="guy&#xA;threep"/>
</elemke>

and an XSL like

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet  version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template name="split">
  <xsl:param name="list"      select="''" />
  <xsl:param name="separator" select="'&#xA;'" />
  <xsl:if test="not($list = '' or $separator = '')">
    <xsl:variable name="head" select="substring-before(concat($list, $separator), $separator)" />
    <xsl:variable name="tail" select="substring-after($list, $separator)" />

    <xsl:value-of select="$head"/>
    <br/><xsl:text>&#xA;</xsl:text>
    <xsl:call-template name="split">
        <xsl:with-param name="list"      select="$tail" />
        <xsl:with-param name="separator" select="$separator" />
    </xsl:call-template>
  </xsl:if>
</xsl:template>


<xsl:template match="brush">
  <html>
  <xsl:call-template name="split">
    <xsl:with-param name="list" select="@wood"/>
  </xsl:call-template>
  </html>
</xsl:template>

</xsl:stylesheet>

you can get a html like:

<html>guy<br>
   threep<br>

</html>  

as tested/produced with a processor like this saxon command line:

java -jar saxon9he.jar -s:in.xml -xsl:in.xsl -o:out.html
Community
  • 1
  • 1
n611x007
  • 8,952
  • 8
  • 59
  • 102
  • the `ATTLIST` and the `DOCTYPE` here is actually unneeded, CDATA would be the default 'attribute type' for this [`AttValue`](http://www.jelks.nu/XML/xmlebnf.html#NT-AttValue) here. – n611x007 Apr 21 '15 at 20:20
  • FYI a random post on processor vs parser: http://www.oxygenxml.com/archives/xsl-list/200009/msg00750.html – n611x007 Apr 21 '15 at 20:23
  • Credit to [Tomalak](https://stackoverflow.com/a/2850181/611007) for the 'string' template because in my target xml processor [`tokenize`](http://www.w3.org/TR/xpath-functions/#func-tokenize) was unavailable. – n611x007 Apr 21 '15 at 20:24