How to ignore namespace when selecting XML nodes with XPath

Question

I have to parse an XML document that looks like this:

 <?xml version="1.0" encoding="UTF-8" ?> 
 <m:OASISReport xmlns:m="http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd" 
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd">
  <m:MessagePayload>
   <m:RTO>
    <m:name>CAISO</m:name> 
    <m:REPORT_ITEM>
     <m:REPORT_HEADER>
      <m:SYSTEM>OASIS</m:SYSTEM> 
      <m:TZ>PPT</m:TZ> 
      <m:REPORT>AS_RESULTS</m:REPORT> 
      <m:MKT_TYPE>HASP</m:MKT_TYPE> 
      <m:UOM>MW</m:UOM> 
      <m:INTERVAL>ENDING</m:INTERVAL> 
      <m:SEC_PER_INTERVAL>3600</m:SEC_PER_INTERVAL> 
     </m:REPORT_HEADER>
     <m:REPORT_DATA>
      <m:DATA_ITEM>NS_PROC_MW</m:DATA_ITEM> 
      <m:RESOURCE_NAME>AS_SP26_EXP</m:RESOURCE_NAME> 
      <m:OPR_DATE>2010-11-17</m:OPR_DATE> 
      <m:INTERVAL_NUM>1</m:INTERVAL_NUM> 
      <m:VALUE>0</m:VALUE> 
     </m:REPORT_DATA>

The problem is that the namespace "http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd" can sometimes be different. I want to ignore it completely and just get my data from tag MessagePayload downstream.

The code I am using so far is:

String[] namespaces = new String[1];
  String[] namespaceAliases = new String[1];

  namespaceAliases[0] = "ns0";
  namespaces[0] = "http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd";

  File inputFile = new File(inputFileName);

  Map namespaceURIs = new HashMap();

  // This query will return all of the ASR records.
  String xPathExpression = "/ns0:OASISReport
                             /ns0:MessagePayload
                              /ns0:RTO
                               /ns0:REPORT_ITEM
                                /ns0:REPORT_DATA";
  xPathExpression += "|/ns0:OASISReport
                        /ns0:MessagePayload
                         /ns0:RTO
                          /ns0:REPORT_ITEM
                           /ns0:REPORT_HEADER";

  // Load up the raw XML file. The parameters ignore whitespace and other
  // nonsense,
  // reduces DOM tree size.
  SAXReader reader = new SAXReader();
  reader.setStripWhitespaceText(true);
  reader.setMergeAdjacentText(true);
  Document inputDocument = reader.read(inputFile);

  // Relate the aliases with the namespaces
  if (namespaceAliases != null && namespaces != null)
  {
   for (int i = 0; i < namespaceAliases.length; i++)
   {
    namespaceURIs.put(namespaceAliases[i], namespaces[i]);
   }
  }

  // Cache the expression using the supplied namespaces.
  XPath xPath = DocumentHelper.createXPath(xPathExpression);
  xPath.setNamespaceURIs(namespaceURIs);

  List asResultsNodes = xPath.selectNodes(inputDocument.getRootElement());

It works fine if the namespace never changes but that is obviously not the case. What do I need to do to make it ignore the namespace? Or if I know the set of all possible namespace values, how can I pass them all to the XPath instance?

@user452103: XPath is XML Names complain, so it will never ignore namespace. You can use **expression** that selects nodes regarding namespace. If namespace URI is changing so often, then is the wrong URI. **Namespace URI suppose to indicate that element belong to specific XML vocabulary**. — , Dec 09 '10 at 19:49
@Alejandro: thanks for the formatting, it does look better now. What expression can I use to select nodes regardless of namespace? — lukegf, Dec 09 '10 at 20:15
Good question, +1. See my answer for a single XPath 1.0 expression that selects exactly the wanted nodes. :) — Dimitre Novatchev, Dec 09 '10 at 20:32
Point of terminology... Re: title of question... XPath is not for *parsing* XML but for selecting nodes in an XML document tree that has already been parsed. Parsing is the process of reading a linear stream of characters, such as a file, and constructing a data structure, such as a node tree. — LarsH, Dec 09 '10 at 23:08
@Alejandro has a good point: the namespace URI should not change unless you are changing to a different XML vocabulary... in which case, you will have to change your code anyway. So whoever made a design decision to change the namespace URI frequently needs some training about XML namespaces. — LarsH, Dec 09 '10 at 23:20
https://stackoverflow.com/questions/4440451/how-to-ignore-namespaces-with-xpath — n611x007, Sep 21 '15 at 12:47
"Should" is the key word. But you are failing to take into account the concept of interfacing. Where one system is communicating with another system, same XML standard defined but separate entities. There is no XML authority that is going to tell company A to name it like company B. While the content may match, the namespace can be anything. — Chris, Jan 26 '17 at 15:17
You could use Namespace = false on a XmlTextReader see: https://stackoverflow.com/a/49361232/9516092 — Pierre Vonderscher, Mar 19 '18 at 11:04

score 132 · Answer 1 · answered Dec 09 '10 at 20:20

132

This is FAQ (but I'm lazy to search duplicates today)

In XPath 1.0

//*[local-name()='name']

Selects any element with "name" as local-name.

In XPath 2.0 you can use:

//*:name

answered Dec 09 '10 at 20:20

Does the FAQ explain why the namespaces are not ignored by default which is what 95% of users want. How often do one really need namespaces for disambiguation? – frenchone Feb 15 '23 at 15:28
@frenchone Have never, not once, needed the namespace. – Zimano Jun 08 '23 at 11:40

Dimitre Novatchev · Accepted Answer · 2021-09-29T00:02:44.823

45

Use:

/*/*/*/*/*
        [local-name()='REPORT_DATA' 
       or 
         local-name()='REPORT_HEADER'
        ]

edited Sep 29 '21 at 00:02

answered Dec 09 '10 at 20:30

Dimitre Novatchev

240,661
26
293
431

do you mean to use that as the value of xPathExpression in the code above? – lukegf Dec 09 '10 at 20:42
@user452103: Yes, exactly. This is the XPath expression to use. – Dimitre Novatchev Dec 09 '10 at 21:59
so, just to clarify, should it be like this now: String xPathExpression = "/*/*/*/*/*[local-name()='REPORT_DATA' or local-name()='REPORT_HEADER']"; – lukegf Dec 10 '10 at 14:55
1

@user452103:Yes, Why don't you just try it? This expression selects the two wanted nodes in the provided XML document. – Dimitre Novatchev Dec 10 '10 at 15:33
1

@ClaraOnager, This selects any element on the 4th level below the top, whose local-name() is either 'REPORT_DATA' or 'REPORT_HEADER' – Dimitre Novatchev Jul 03 '13 at 14:24

How to ignore namespace when selecting XML nodes with XPath

2 Answers2

Linked

Related