7

I'd like to convert XML into CSV using an XSLT, but when applying the XSL from the SO thread titled XML To CSV XSLT against my input:

<WhoisRecord>
  <DomainName>127.0.0.1</DomainName>
  <RegistryData>
    <AbuseContact>
      <Email>abuse@iana.org</Email>
      <Name>Internet Corporation for Assigned Names and Number</Name>
      <Phone>+1-310-301-5820</Phone>
    </AbuseContact>
    <AdministrativeContact i:nil="true"/>
    <BillingContact i:nil="true"/>
    <CreatedDate/>
    <RawText>...</RawText>
    <Registrant>
      <Address>4676 Admiralty Way, Suite 330</Address>
      <City>Marina del Rey</City>
      <Country>US</Country>
      <Name>Internet Assigned Numbers Authority</Name>
      <PostalCode>90292-6695</PostalCode>
      <StateProv>CA</StateProv>
    </Registrant>
    <TechnicalContact>
      <Email>abuse@iana.org</Email>
      <Name>Internet Corporation for Assigned Names and Number</Name>
      <Phone>+1-310-301-5820</Phone>
    </TechnicalContact>
    <UpdatedDate>2010-04-14</UpdatedDate>
    <ZoneContact i:nil="true"/>
  </RegistryData>
</WhoisRecord>

I end up with:

  abuse@iana.orgInternet Corporation for Assigned Names and Number+1-310-301-5820,
    ,
    ,
    ,
    ...,      
    4676 Admiralty Way, Suite 330Marina del ReyUSInternet Assigned Numbers Authority90292-6695CA,      
    abuse@iana.orgInternet Corporation for Assigned Names and Number+1-310-301-5820,      
    2010-04-14,

My problem is that, the resulting transformation is missing nodes (like the DomainName element containing the IP address) and some child nodes are concatenated without commas (like the children of AbuseContact).

I'd like to see all the XML output in CSV form, and strings like: "abuse@iana.orgInternet Corporation for Assigned Names and Number+1-310-301-5820," delimited by commas.

My XSL is pretty rusty. Your help is appreciated. :)

Here's the XSL I'm using:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>

<xsl:strip-space elements="*" />

<xsl:template match="/*/child::*">
  <xsl:for-each select="child::*">
    <xsl:if test="position() != last()"><xsl:value-of select="normalize-space(.)"/>,    </xsl:if>
    <xsl:if test="position()  = last()"><xsl:value-of select="normalize-space(.)"/><xsl:text>
</xsl:text>
  </xsl:if>
  </xsl:for-each>
</xsl:template>

</xsl:stylesheet>
Community
  • 1
  • 1
Adam Kahtava
  • 412
  • 3
  • 11

2 Answers2

5

This simple transformation produces the wanted result:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>
 <xsl:strip-space elements="*"/>
 
    <xsl:template match="/">
    <xsl:apply-templates select="//text()"/>
    </xsl:template>
    
    <xsl:template match="text()">
      <xsl:copy-of select="."/>
      <xsl:if test="not(position()=last())">,</xsl:if>
    </xsl:template>
</xsl:stylesheet>

Do note the use of:

 <xsl:strip-space elements="*"/>

to discard any white-space-only text nodes.

Update: AJ raised the problem that the results should be grouped in records/tuples per line. It isn't defined in the question what a record/tuple should exactly be. Therefore the current solution solves the two problems of white-space-only text nodes and of missing commas, but does not aim to grop the output into records/tuples.

U. Windl
  • 3,480
  • 26
  • 54
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Doesn't CSV require a new line to separate a set/tuple of records? –  May 17 '10 at 18:58
  • 2
    It is not clear from the question what constitutes a tuple of records -- this has meaning in the relational DB world, but for a tree needs to be explicitly defined. I also edited my answer to reflect your comment. – Dimitre Novatchev May 17 '10 at 19:34
  • Thanks guys! I would like a set/tuple of records. How hard would that be? I'd also like to be able to apply the XSL to similarly structured XML documents - solutions that don't reference elements by name are preferred. Thanks again. :) – Adam Kahtava May 18 '10 at 00:06
  • 2
    @Adam-Kahtava: It shouldn't be difficult to implement tuples/sets once you define what should a tuple consist of. – Dimitre Novatchev May 18 '10 at 01:05
  • *chuckles at the irony in these comments* ;-) – Tomalak May 18 '10 at 09:28
  • 1
    Some fields (well, the address element) have commas in, so you probaly need to check for this, and enclose the field in quotation marks. And if the field contains quotation marks, I believe this have to become double-quotation marks. – Tim C May 18 '10 at 09:41
  • @Dimitre your initial solution meet my needs (no need to complicate things). Your XSL is now part of a custom CSV behavior for WCF, view the behavior in action: http://adam.kahtava.com/services/whois.csv The custom behavior along with your XSL can be found here: http://code.google.com/p/adamdotcom-services/source/browse/trunk/AdamDotCom.Common.Service/Source/Common/Infrastructure/CSV/ – Adam Kahtava May 18 '10 at 16:06
  • @Adam-Kahtava: Glad this was useful. :) – Dimitre Novatchev May 18 '10 at 16:36
0

I believe that you need recursive solution to approach this problem. So, you'd require something that keeps diving into the tree till it reaches a text() node. If that text() node is the actually a child of the last node, then it puts a new line. Otherwise, it just puts the value with a comma.

If the node does not has a text() node as its child, then recursively start to dig into that tree.

<xsl:strip-space elements="*" />    

<xsl:template name="rec">        
    <xsl:param name="node"/>        
    <xsl:for-each select="child::*">
        <xsl:choose>
            <xsl:when test="child::text()">
                <xsl:choose>                        
                    <xsl:when test="local-name(.) != 'UpdatedDate'">"<xsl:value-of select="normalize-space(.)"/>", </xsl:when>
                    <xsl:otherwise>"<xsl:value-of select="normalize-space(.)"/>" <xsl:text>&#xD;</xsl:text></xsl:otherwise>
                </xsl:choose>                    
            </xsl:when>
            <xsl:when test="child::node()">
                <xsl:call-template name="rec">
                    <xsl:with-param name="node" select="child::node()"/>
                </xsl:call-template>                    
            </xsl:when>
        </xsl:choose>

    </xsl:for-each>
</xsl:template>

This is not fool proof, but it produced this result on my end with Saxon:

"127.0.0.1", "abuse@iana.org", "Internet Corporation for Assigned Names and Number", "+1-310-301-5820", "...", "4676 Admiralty Way, Suite 330", "Marina del Rey", "US", "Internet Assigned Numbers Authority", "90292-6695", "CA", "abuse@iana.org", "Internet Corporation for Assigned Names and Number", "+1-310-301-5820", "2010-04-14"

Hope this helps.

  • Why was my answer voted down? A comment about it would have been helpful. I am new to XSLT myself. –  May 17 '10 at 18:54
  • 1
    Probably because there is no explicit recursion or looping needed to move along the the child axis. – Tomalak May 18 '10 at 09:30