0

We can parse this test XML file with this XSL file fine:

Test XML:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="newrows.xsl" type="text/xsl"?>
<Workbook>
    <Worksheet>
        <Table>
            <Row>
                <Cell></Cell>
                <Cell>(info...)</Cell>
                <Cell></Cell>
            </Row>
            <Row>
                <Cell>first name</Cell>
                <Cell>last name</Cell>
                <Cell>age</Cell>
            </Row>
            <Row>
                <Cell>Jim</Cell>
                <Cell>Smith</Cell>
                <Cell>34</Cell>
            </Row>
            <Row>
                <Cell>Roy</Cell>
                <Cell>Rogers</Cell>
                <Cell>22</Cell>
            </Row>
            <Row>
                <Cell>(info...)</Cell>
                <Cell></Cell>
                <Cell>(info...)</Cell>
            </Row>

            <Row>
                <Cell>Sally</Cell>
                <Cell>Cloud</Cell>
                <Cell>26</Cell>
            </Row>

            <Row>
                <Cell>John</Cell>
                <Cell>Randall</Cell>
                <Cell>44</Cell>
            </Row>  

        </Table>
    </Worksheet>
</Workbook>

XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  version="1.0">

    <xsl:output method="xml" indent="yes"/>

    <xsl:param name="range-1-begin"  select="1"/>
    <xsl:param name="range-1-end"  select="3"/>

    <xsl:param name="range-2-begin"  select="5"/>
    <xsl:param name="range-2-end"  select="6"/>

    <xsl:template match="Table">
        <test>
            <xsl:for-each select="Row">
                <xsl:if test="(position() &gt;= $range-1-begin and position() &lt;= $range-1-end)
                    or (position() &gt;= $range-2-begin and position() &lt;= $range-2-end)">
                    <Row>
                       <xsl:for-each select="Cell">
                            <xsl:if test="position() = 1 or position() = 3">
                                <Cell>
                                    <xsl:value-of select="."/>
                                </Cell>
                            </xsl:if>
                        </xsl:for-each>
                    </Row>
                </xsl:if>
            </xsl:for-each>
        </test>
    </xsl:template>

</xsl:stylesheet>

However, when we try to parse this similar XML file exported from Excel, it exports the content of every field with no XML element tags. We can even type in kksljflskdjf instead of Table and it outputs the content of every XML element.

What do I have to change in the XML/XSL file so that the XSL file correctly parses the XML?

Excel XML (exceprts):

<?xml version="1.0"?>
<?xml-stylesheet href="blackbox.xsl" type="text/xsl"?>
<Workbook 
xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" 
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
    <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
        <Author>MM</Author>
        <LastAuthor>xx</LastAuthor>
        ...
<Worksheet ss:Name="OFFSET Individual">
        <Names>
            <NamedRange ss:Name="_FilterDatabase" ss:RefersTo="='OFFSET Individual'!R3C2:R3C12" ss:Hidden="1"/>
            <NamedRange ss:Name="Print_Area" ss:RefersTo="='OFFSET Individual'!R4C2:R435C15"/>
            <NamedRange ss:Name="Muster" ss:RefersTo="='OFFSET Individual'!C1:C9"/>
            <NamedRange ss:Name="PAP" ss:RefersTo="='OFFSET Individual'!C2"/>
        </Names>
        <Table ss:ExpandedColumnCount="31" ss:ExpandedRowCount="443" x:FullColumns="1" x:FullRows="1" ss:StyleID="s90" ss:DefaultColumnWidth="59" ss:DefaultRowHeight="15">
            <Column ss:StyleID="s416" ss:Hidden="1" ss:AutoFitWidth="0" ss:Width="61"/>
            <Column ss:StyleID="s91" ss:AutoFitWidth="0" ss:Width="287"/>
            <Column ss:StyleID="s547" ss:AutoFitWidth="0" ss:Width="216"/>
            <Column ss:StyleID="s91" ss:AutoFitWidth="0" ss:Width="87"/>
            <Column ss:StyleID="s92" ss:AutoFitWidth="0" ss:Width="202"/>
            <Column ss:StyleID="s90" ss:AutoFitWidth="0" ss:Width="87"/>
            <Column ss:StyleID="s101" ss:AutoFitWidth="0" ss:Width="284"/>
            <Column ss:StyleID="s132" ss:Hidden="1" ss:AutoFitWidth="0" ss:Width="52"/>
            <Column ss:StyleID="s137" ss:Hidden="1" ss:AutoFitWidth="0" ss:Width="47"/>
            <Column ss:StyleID="s90" ss:Hidden="1" ss:AutoFitWidth="0" ss:Width="42"/>
            <Column ss:StyleID="s90" ss:Hidden="1" ss:AutoFitWidth="0" ss:Width="39"/>
            <Column ss:StyleID="s90" ss:Hidden="1" ss:AutoFitWidth="0" ss:Width="37"/>
            <Column ss:StyleID="s113" ss:AutoFitWidth="0" ss:Width="47"/>
            <Column ss:StyleID="s87" ss:Hidden="1" ss:AutoFitWidth="0" ss:Width="275"/>
            <Column ss:StyleID="s458" ss:AutoFitWidth="0" ss:Width="89"/>
            <Column ss:StyleID="s179" ss:AutoFitWidth="0" ss:Span="1"/>
            <Column ss:Index="18" ss:StyleID="s168" ss:Hidden="1" ss:AutoFitWidth="0"/>
            <Column ss:StyleID="s90" ss:Hidden="1" ss:AutoFitWidth="0"/>
            <Column ss:StyleID="s377" ss:AutoFitWidth="0" ss:Width="202" ss:Span="2"/>
            <Column ss:Index="23" ss:StyleID="s377" ss:AutoFitWidth="0" ss:Width="203"/>
            <Row ss:AutoFitHeight="0" ss:Height="23">
                <Cell ss:Index="2" ss:StyleID="s142">
                    <Data ss:Type="String">Paper Overview</Data>
                    <NamedCell ss:Name="PAP"/>
                    <NamedCell ss:Name="Muster"/>
                </Cell>
            </Row>
            <Row ss:AutoFitHeight="0">
                <Cell ss:Index="2" ss:StyleID="s141">
                    <Data ss:Type="String">Stand: 10.03.2011; 13:00 Uhr</Data>
                    <NamedCell ss:Name="PAP"/>
                    <NamedCell ss:Name="Muster"/>
                </Cell>
            </Row>
                        ...

Here is an example of the resulting "XML" file:

enter image description here

Addendum

This is the full solution which now works, thanks @Dimitre!

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="urn:schemas-microsoft-com:office:spreadsheet" 
    xmlns:y="urn:schemas-microsoft-com:office:spreadsheet" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:x="urn:schemas-microsoft-com:office:excel" 
    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 
    xmlns:html="http://www.w3.org/TR/REC-html40"
  exclude-result-prefixes="y o x ss html"
 >

 <xsl:strip-space elements="*"/>
    <xsl:output method="xml" indent="yes"/>

    <xsl:param name="range-1-begin"  select="1"/>
    <xsl:param name="range-1-end"  select="3"/>

    <xsl:param name="range-2-begin"  select="5"/>
    <xsl:param name="range-2-end"  select="6"/>

    <xsl:template match="text()"/> 

    <xsl:template match="y:Table">
        <test>
            <xsl:for-each select="y:Row">
                <xsl:if test="(position() &gt;= $range-1-begin and position() &lt;= $range-1-end)
                    or (position() &gt;= $range-2-begin and position() &lt;= $range-2-end)">
                    <Row>
                       <xsl:for-each select="y:Cell">
                            <xsl:if test="position() = 1 or position() = 3">
                                <Cell>
                                    <xsl:value-of select="."/>
                                </Cell>
                            </xsl:if>
                        </xsl:for-each>
                    </Row>
                </xsl:if>
            </xsl:for-each>
        </test>
    </xsl:template>

</xsl:stylesheet>
Edward Tanguay
  • 189,012
  • 314
  • 712
  • 1,047
  • One of many posible duplicate of [XSLT with XML source that has a default namespace set to xmlns](http://stackoverflow.com/questions/1344158/xslt-with-xml-source-that-has-a-default-namespace-set-to-xmlns) –  Mar 11 '11 at 13:45

2 Answers2

4

What do I have to change in the XML/XSL file so that the XSL file correctly parses the XML?

First of all, your terminology is quite incorrect. An XSLT transformation is applied on an already parsed XML document. The parsing (by an XML parser) is a prerequisit for being able to apply a transformation.

This is the most FAQ on XML, XPath and in XSLT:

The reason for not being able to select any element by name the second document is because there is a default namesace defined in it (xmlns="urn:schemas-microsoft-com:office:spreadsheet").

In XPath any unprefixed name is considered to be in "no namespace". Therefore the template matching Table and the <xsl:for-each> selecting Row elements will not match/select any element, because in the XML document there are no such elements that are in "no namespace".

The most readable solution is to define the same namespaces in the XSLT stylesheet and to use prefixed names in any XPath expression/match-pattern.

Thus, in the corrected XSLT stylesheet you will have:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:y="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40"
  exclude-result-prefixes="y o x ss html"
 >
    <xsl:output method="xml" indent="yes"/>

    <xsl:param name="range-1-begin"  select="1"/>
    <xsl:param name="range-1-end"  select="3"/>
    <xsl:param name="range-2-begin"  select="5"/>
    <xsl:param name="range-2-end"  select="6"/>

    <xsl:template match="y:Table">
        <test>
            <xsl:for-each select="y:Row">
                <xsl:if test="(position() &gt;= $range-1-begin and position() &lt;= $range-1-end)                     or (position() &gt;= $range-2-begin and position() &lt;= $range-2-end)">
                    <Row>
                        <xsl:for-each select="Cell">
                            <xsl:if test="position() = 1 or position() = 3">
                                <Cell>
                                    <xsl:value-of select="."/>
                                </Cell>
                            </xsl:if>
                        </xsl:for-each>
                    </Row>
                </xsl:if>
            </xsl:for-each>
        </test>
    </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • This gets me farther, but it still includes large amounts of whitespace(?) above and below the correctly outputted XML. How can I tell it to *just* output the XML I need, i.e. how can I tell the XSL not to identify all of this extra output? – Edward Tanguay Mar 11 '11 at 14:17
  • @Edward-Tanguay: Glad that I could help -- you are welcome. Maybe you could consider accepting the answer? :) – Dimitre Novatchev Mar 11 '11 at 14:19
  • But it doesn't work 100% yet: with your block of namespaces and the y:Table, y:Row (and I had to do a y:Cell change as well), I get the correct XML in the *middle* of a very long file of mostly whitespace and various data from variou cells, as if it still isn't matching up a namespace in the XML file, but I even copied them 1-to-1 and it still outputs the same long file with lots of whitespace, where is this coming from? – Edward Tanguay Mar 11 '11 at 14:23
  • for instance, it outputs "MM" followed by lots of whitespace then "xx" followed by lots of whitespace which is coming from this XML: MM xx – Edward Tanguay Mar 11 '11 at 14:29
  • @Edward-Tanguay: One thing to note is that the new XML document is quite different from the initial one and you cannot expect when the structure of the XML document changes significantly, that your old transformation will continue to produce the expected results. As for the white space, add at global level ` ` and you can also try adding this template: `` – Dimitre Novatchev Mar 11 '11 at 14:46
  • That was it, after adding those two statements it exports perfectly now, thanks! – Edward Tanguay Mar 11 '11 at 15:10
  • @Edward-Tanguay: You are welcome. It is likely that you don't fully understand why these two instructions were necessary and what they are causing -- it would be best if you ask a separate question about this -- you'll get good answers. – Dimitre Novatchev Mar 11 '11 at 15:34
2

Your Test Xml and Xsl do not declare and use any namespaces whereas the Excel Xml export defines various namespaces:

xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
Filburt
  • 17,626
  • 12
  • 64
  • 115