2

I have the following XML:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
          xmlns:o="urn:schemas-microsoft-com:office:office"
          xmlns:x="urn:schemas-microsoft-com:office:excel"
          xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
          xmlns:html="http://www.w3.org/TR/REC-html40">
  <Names>
    <NamedRange ss:Name="SomeNamedRange" ss:RefersTo="=Control!R1C1:R51C4"/>
  </Names>
  <Worksheet ss:Name="Control" ss:Protected="1">
    <Table ss:ExpandedColumnCount="4" ss:ExpandedRowCount="51">
      <Row>
        <Cell ss:StyleID="s145">          
          <Comment ss:Author="Some comment here">
            <ss:Data xmlns="http://www.w3.org/TR/REC-html40"></ss:Data>
          </Comment>          
        </Cell>
      </Row>      
    </Table>
  </Worksheet>
</Workbook>

I would like to get the Names element with XPath, so I try:

//Names

but this doesn't work. So far, I have found a number of ways to fix this.

//ss:Names
//*:Names
//*[local-name()='Names']

OR, I can delete the following element:

<ss:Data xmlns="http://www.w3.org/TR/REC-html40"></ss:Data>

So clearly, this is something to do with namespaces but I still don't really understand what's going on. So I have two questions:

  1. Why does deleting the ss:Data element affect being able to read the Names element?
  2. Given that there are 5 namespaces declared at the top, why is the Names element considered to be in the ss namespace (when the ss:Data element exists)?
  3. What is the correct general approach here? I feel like there is some general piece of information I'm missing about either XML or XPath

EDIT:

This issue is not limited to http://xpather.com/. I have had various results with different XPath websites, and have summarised the results here.

bornfromanegg
  • 2,826
  • 5
  • 24
  • 40
  • 2
    Your `ss` prefix refers to the same namespace as the default namespace defined by `xmlns`, `urn:schemas-microsoft-com:office:spreadsheet`. It is that value than defines which namespace the element belongs to, not the prefix. So all `ss`-prefixed elements and all non-prefixed elements belong to the same namespace, `urn:schemas-microsoft-com:office:spreadsheet`. The `` changes the default namespace for its children, so unprefixed `Names` inside that will not qualify for `urn:schemas-microsoft-com:office:spreadsheet`. – GSerg Dec 06 '19 at 11:45
  • 1
    Ah, ok, I think I understand the first bit. As for the second part, the `` element is not a child of the `` element, so why would it be affected by it? – bornfromanegg Dec 06 '19 at 11:54
  • I assumed the XML you are showing is not complete. If it is, then it should not be affected, and you should add to your question exactly how you execute these XPath expressions and display the results. – GSerg Dec 06 '19 at 11:55
  • @GSerg Well I deleted a lot of XML to create a minimal example, but it is complete XML (I think), even if it does not actually represent a valid Excel document. To test, I pasted the complete XML above into http://xpather.com/, then entered the XPaths above. (I used a bunch of other XPath websites as well, I don't think it matters, although they don't all understand the `//*:Names` syntax. – bornfromanegg Dec 06 '19 at 12:01
  • 2
    It will matter because you can't just use a prefix in XPath from C#, you need to declare it first in [one way](https://stackoverflow.com/a/585822/11683) or [another](https://stackoverflow.com/a/5501543/11683). Please try from C# and post the results. – GSerg Dec 06 '19 at 12:05
  • Yes, I am aware of that, but my question is _why_ removing the `' element means that I can read the `` element without specifying the namespace. I'm not sure what I can do in C# that will answer that question for me. – bornfromanegg Dec 06 '19 at 12:12
  • 1
    Currently you are observing the behaviour of a particular XML tool, thus the answer would be "because it could be a bug or an obscure feature of that tool". So please see if you can replicate that behaviour with C#. I can't; removing the `` does not let me magically read `Names` without using a prefix from C#. – GSerg Dec 06 '19 at 12:25
  • Ah, ok. Well that would make more sense. Ok, I will try in C#. Will not be able to do that immediately - so will post back when I’ve had the chance. Thanks. – bornfromanegg Dec 06 '19 at 12:28
  • This is not a duplicate of the proposed question. That is asking how to specify namespaces. I am asking _why_ I need to specify a namespace in my specific example. – bornfromanegg Dec 06 '19 at 13:35
  • 3
    The duplicate links will explain to future readers how to use XPath with namespaces. Sorry, but there will be more of them than there are of you given how you've titled your question. Your particular problem is much more narrow: You've stumbled across a bug in xpather.com. Note that their default XML has the following disclaimer regarding namespaces: *XPath 2.0 is supported but **namespaces are still being added and they may not fully work yet.*** You are right to be puzzled: Just deleting `ss:Data` should not cause `//Names` to suddenly select something. – kjhughes Dec 06 '19 at 14:24
  • 1
    Perhaps for all concerned it would help most if I re-opened your question, retitled it, and added the above as an answer. Will do.... – kjhughes Dec 06 '19 at 14:29
  • @kjhughes Thanks for that. If that’s the case, that would be great (I can’t check right now). I thought I’d checked this on other XPath sites but perhaps not. – bornfromanegg Dec 06 '19 at 14:35
  • 3
    ...done. You're welcome. Any other online XPath site that behaves as XPather.com does here would also be non-compliant. If you find any, please let us know. I've emailed a link to this Q/A to `xpather.com@gmail.com`. – kjhughes Dec 06 '19 at 14:46
  • @kjhuges Thanks for your help. I've added [an answer](https://stackoverflow.com/a/59246808/1158174) summarising my experiences with the XPath sites that I tried. – bornfromanegg Dec 09 '19 at 10:33

2 Answers2

1

You are right to be puzzled.

Just deleting ss:Data should not cause //Names to suddenly select the Names child of Workbook when Workbook declares a default namespace of urn:schemas-microsoft-com:office:spreadsheet. You appear to have stumbled across a bug in xpather.com. Note that their opening, default XML has the following disclaimer regarding namespaces:

This application is in an early beta version so please be forgiving. XPath 2.0 is supported but namespaces are still being added and they may not fully work yet. Please send your comments to: xpather.com@gmail.com

See also (for general XPath in namespaces guidance):


Another xpather.com issue

Currently, xpather.com does not understand that element names may include period (.) characters.


And yet another xpather.com issue

This fully compliant XPath,

//item/comment()[not(preceding-sibling::*)]

results in the following (improper) error message on xpather.com:

TypeError: Cannot read property 'childPosition' of undefined

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Thanks for this. I hadn't noticed the beta comment, but I will keep an eye on xpather since it is looking good so far. I have added my results with various websites in [this answer](https://stackoverflow.com/a/59246808/1158174), in case it proves useful to future visitors. – bornfromanegg Dec 09 '19 at 10:27
1

I've decided to add this as an answer rather than an edit to the original question since I still may be missing something, but thanks to the comment/answers from @GSerg and @kjhughes, I did some investigation. If this turns out to be useful, I can edit the question and add it in.

The following is just a handful of websites for online XPath evaluation, and how they behaved in my scenario.

+--------------------------------------------------------+--------------+-------------+------------+------------+
|                                                        |     With <ss:Data>         |    Without <ss:Data>    |
+--------------------------------------------------------+--------------+-------------+------------+------------+
|                                                        | //Names      | //ss:Names  | //Names    | //ss:Names |
+--------------------------------------------------------+--------------+-------------+------------+------------+
| https://www.freeformatter.com/xpath-tester.html        | No Match     | Match       | Match      | Match      |
| https://codebeautify.org/Xpath-Tester                  | No Match     | No Match    | No Match   | No Match   |
| http://xpather.com/                                    | No Match     | Match       | Match      | Match      |
| https://www.webtoolkitonline.com/xml-xpath-tester.html | No Match     | Error       | No Match   | Error      |
| http://www.utilities-online.info/xpath/#.Xe4VtTP7QuU   | No Match     | No Match    | No Match   | No Match   |
| https://extendsclass.com/xpath-tester.html             | No Match     | Match       | No Match   | Match      |
+--------------------------------------------------------+--------------+-------------+------------+------------+

From what I understand of the answers so far, the only one that is behaving completely sensibly seems to be ExtendsClass, although freeformatter and xpather do produce the right results when the namespace is specified.

It should also be pointed out that xpather does clearly announce its beta status, and also has a nice UI.

bornfromanegg
  • 2,826
  • 5
  • 24
  • 40