81

I have the following XML document:

<projects>
  <project>
   <name>Shockwave</name> 
   <language>Ruby</language> 
   <owner>Brian May</owner> 
   <state>New</state> 
   <startDate>31/10/2008 0:00:00</startDate> 
  </project>
  <project>
   <name>Other</name> 
   <language>Erlang</language> 
   <owner>Takashi Miike</owner> 
   <state> Canceled </state> 
   <startDate>07/11/2008 0:00:00</startDate> 
  </project>
...

And I'd like to get this from the transformation (XSLT) result:

Shockwave,Ruby,Brian May,New,31/10/2008 0:00:00
Other,Erlang,Takashi Miike,Cancelled,07/11/2008 0:00:00

Does anyone know the XSLT to achieve this? I'm using .net in case that matters.

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
Pablo Fernandez
  • 103,170
  • 56
  • 192
  • 232
  • .NET only matters if you are using the [XslTransform Class](http://msdn.microsoft.com/en-us/library/system.xml.xsl.xsltransform%28v=vs.110%29.aspx) that only supports xslt 1.0. Is this a constraint? If so, it should be re-tagged to [xslt-1.0](http://stackoverflow.com/tags/xslt-1.0/info). – Ryan Gates Feb 06 '13 at 17:22
  • A good answer is also given here to a similar question if you use Linux http://askubuntu.com/questions/174143/convert-xml-to-csv-shell-command-linee – Leonard Saers Mar 09 '17 at 10:58
  • There is the tool [xml2csv](https://github.com/fordfrog/xml2csv). Maybe, this is useful in your case, too? – koppor Jul 01 '17 at 04:35
  • xml2csv worked for me just fine. I found that the documentation is broken, but took a moment to document the work-around which produced usable results for me, here: https://github.com/fordfrog/xml2csv/issues/5#issuecomment-726542532 – Hugh Esco Nov 13 '20 at 06:08

4 Answers4

58

Here is a version with configurable parameters that you can set programmatically:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" encoding="utf-8" />

  <xsl:param name="delim" select="','" />
  <xsl:param name="quote" select="'&quot;'" />
  <xsl:param name="break" select="'&#xA;'" />

  <xsl:template match="/">
    <xsl:apply-templates select="projects/project" />
  </xsl:template>

  <xsl:template match="project">
    <xsl:apply-templates />
    <xsl:if test="following-sibling::*">
      <xsl:value-of select="$break" />
    </xsl:if>
  </xsl:template>

  <xsl:template match="*">
    <!-- remove normalize-space() if you want keep white-space at it is --> 
    <xsl:value-of select="concat($quote, normalize-space(), $quote)" />
    <xsl:if test="following-sibling::*">
      <xsl:value-of select="$delim" />
    </xsl:if>
  </xsl:template>

  <xsl:template match="text()" />
</xsl:stylesheet>
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 1
    I like the mandatory quoting. At least when importing into, Excel, it takes care of the case where there is a $delim in the original data. – bortzmeyer Dec 15 '08 at 08:11
  • What we need to do if we want to also include the column names?? – omer khalid May 12 '20 at 08:51
  • 1
    @omer There are a couple of ways to do this, depending on your XML. It's best if you ask a new question, because the comment section is not a good place to discuss things like this, and because in *this* thread it was not part of the question, so I won't edit the answer. – Tomalak May 12 '20 at 09:06
47

Found an XML transform stylesheet here (wayback machine link, site itself is in german)

The stylesheet added here could be helpful:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>

<xsl:strip-space elements="*" />

<xsl:template match="/*/child::*">
<xsl:for-each select="child::*">
<xsl:if test="position() != last()">"<xsl:value-of select="normalize-space(.)"/>",    </xsl:if>
<xsl:if test="position()  = last()">"<xsl:value-of select="normalize-space(.)"/>"<xsl:text>&#xD;</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Perhaps you want to remove the quotes inside the xsl:if tags so it doesn't put your values into quotes, depending on where you want to use the CSV file.

schnaader
  • 49,103
  • 10
  • 104
  • 136
  • 6
    Be careful, if there is a comma in the original data, it is not escaped. You may want to add a test with contains() and an escape with translate(). – bortzmeyer Dec 15 '08 at 08:11
  • 2
    I don't think this handles a double quote in the data. To escape a double-quote you must substitute it with two double-quotes. – Sarel Botha Nov 10 '10 at 15:43
  • 1
    Normally, one only needs to enclose a value in quotes if it contains any of the following: the delimiter ('`,`'), the quote ('`"`'), a newline (` `). If quoting is required, any inner quotes must first be doubled ('`""`'). – mousio Apr 07 '11 at 08:57
  • See the edit in [this question](http://stackoverflow.com/questions/16594364/xml-to-csv-using-xslt) if you need to select attributes as well. – harpo May 17 '13 at 16:03
  • 2
    Correct new line on unix is ` ` (\n). ` ` is hexadecimal \r – igo Aug 05 '14 at 11:32
  • It works but not exactly as required. It adds new line after every column. Also is there a way to add header (in the first row)? I translate as below to me. Shockwave, Ruby, Brian May, New, 31/10/2008 0:00:00 Other, Erlang, Takashi Miike, Canceled, 07/11/2008 0:00:00 – user2739418 Feb 08 '16 at 11:36
  • The Link is dead – BotMaster3000 Sep 10 '19 at 07:35
  • 1
    @BotMaster3000: thanks, replaced with a wayback machine link – schnaader Sep 10 '19 at 08:18
21

This xsl:stylesheet can use a specified list of column headers and will ensure that the rows will be ordered correctly. It requires XSLT version 2.0.

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:csv="csv:csv">
    <xsl:output method="text" encoding="utf-8"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="delimiter" select="','"/>

    <csv:columns>
        <column>name</column>
        <column>sublease</column>
        <column>addressBookID</column>
        <column>boundAmount</column>
        <column>rentalAmount</column>
        <column>rentalPeriod</column>
        <column>rentalBillingCycle</column>
        <column>tenureIncome</column>
        <column>tenureBalance</column>
        <column>totalIncome</column>
        <column>balance</column>
        <column>available</column>
    </csv:columns>

    <xsl:template match="/property-manager/properties">
        <!-- Output the CSV header -->
        <xsl:for-each select="document('')/*/csv:columns/*">
                <xsl:value-of select="."/>
                <xsl:if test="position() != last()">
                    <xsl:value-of select="$delimiter"/>
                </xsl:if>
        </xsl:for-each>
        <xsl:text>&#xa;</xsl:text>
    
        <!-- Output rows for each matched property -->
        <xsl:apply-templates select="property"/>
    </xsl:template>

    <xsl:template match="property">
        <xsl:variable name="property" select="."/>
    
        <!-- Loop through the columns in order  -->
        <xsl:for-each select="document('')/*/csv:columns/*">
            <!-- Extract the column name and value  -->
            <xsl:variable name="column" select="."/>
            <xsl:variable name="value" select="$property/*[name() = $column]"/>
        
            <!-- Quote the value if required -->
            <xsl:choose>
                <xsl:when test="contains($value, '&quot;')">
                    <xsl:variable name="x" select="replace($value, '&quot;',  '&quot;&quot;')"/>
                    <xsl:value-of select="concat('&quot;', $x, '&quot;')"/>
                </xsl:when>
                <xsl:when test="contains($value, $delimiter)">
                    <xsl:value-of select="concat('&quot;', $value, '&quot;')"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:value-of select="$value"/>
                </xsl:otherwise>
            </xsl:choose>
        
            <!-- Add the delimiter unless we are the last expression -->
            <xsl:if test="position() != last()">
                <xsl:value-of select="$delimiter"/>
            </xsl:if>
        </xsl:for-each>
    
        <!-- Add a newline at the end of the record -->
        <xsl:text>&#xa;</xsl:text>
    </xsl:template>

</xsl:stylesheet>
ioquatix
  • 1,411
  • 17
  • 32
  • 2
    This is nice, but it would not work. `replace()` is an XPath 2.0 function. In XSLT 1.0 you'd have to use a recursive string replace template. – Tomalak Jul 05 '12 at 12:48
  • 1
    Worked for me with xsltproc/libxslt - it was good enough. Thanks for pointing out the requirements though. – ioquatix Jul 11 '12 at 05:44
  • @hd1, I'm still using this script in production so probably you are doing something wrong? – ioquatix Jul 16 '14 at 14:50
  • @hd1, perhaps if you told me exactly what is going wrong I'd be able to help? – ioquatix Jul 17 '14 at 02:56
  • @hd1, perhaps you can explain what your problem was and how you worked around it so other developers may benefit from your knowledge :) – ioquatix Jul 23 '14 at 06:13
  • 2
    Didn't use xsl at all, and decided to use SAX – hd1 Jul 23 '14 at 06:15
  • If using Microsoft xml engine you can try msxsl script ` <![CDATA[ public string Replace(string stringToModify, string pattern, string replacement) { return stringToModify.Replace(pattern, replacement); } ]]> ` Then use `` and add these attributes to stylesheet node ` – vukis Aug 28 '18 at 07:42
8

This CsvEscape function is XSLT 1.0 and escapes column values ,, ", and newlines like RFC 4180 or Excel. It makes use of the fact that you can recursively call XSLT templates:

  • The template EscapeQuotes replaces all double quotes with 2 double quotes, recursively from the start of the string.
  • The template CsvEscape checks if the text contains a comma or double quote, and if so surrounds the whole string with a pair of double quotes and calls EscapeQuotes for the string.

Example usage: xsltproc xmltocsv.xslt file.xml > file.csv

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" encoding="UTF-8"/>

  <xsl:template name="EscapeQuotes">
    <xsl:param name="value"/>
    <xsl:choose>
      <xsl:when test="contains($value,'&quot;')">
    <xsl:value-of select="substring-before($value,'&quot;')"/>
    <xsl:text>&quot;&quot;</xsl:text>
    <xsl:call-template name="EscapeQuotes">
      <xsl:with-param name="value" select="substring-after($value,'&quot;')"/>
    </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
    <xsl:value-of select="$value"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="CsvEscape">
    <xsl:param name="value"/>
    <xsl:choose>
    <xsl:when test="contains($value,',')">
      <xsl:text>&quot;</xsl:text>
      <xsl:call-template name="EscapeQuotes">
    <xsl:with-param name="value" select="$value"/>
      </xsl:call-template>
      <xsl:text>&quot;</xsl:text>
    </xsl:when>
    <xsl:when test="contains($value,'&#xA;')">
      <xsl:text>&quot;</xsl:text>
      <xsl:call-template name="EscapeQuotes">
    <xsl:with-param name="value" select="$value"/>
      </xsl:call-template>
      <xsl:text>&quot;</xsl:text>
    </xsl:when>
    <xsl:when test="contains($value,'&quot;')">
      <xsl:text>&quot;</xsl:text>
      <xsl:call-template name="EscapeQuotes">
    <xsl:with-param name="value" select="$value"/>
      </xsl:call-template>
      <xsl:text>&quot;</xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$value"/>
    </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
  <xsl:template match="/">
    <xsl:text>project,name,language,owner,state,startDate</xsl:text>
    <xsl:text>&#xA;</xsl:text>
    <xsl:for-each select="projects/project">
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(name)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(language)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(owner)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(state)"/></xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="CsvEscape"><xsl:with-param name="value" select="normalize-space(startDate)"/></xsl:call-template>
      <xsl:text>&#xA;</xsl:text>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>
Community
  • 1
  • 1
jmiserez
  • 2,991
  • 1
  • 23
  • 34