63

I have to parse an XML document that looks like this:

 <?xml version="1.0" encoding="UTF-8" ?> 
 <m:OASISReport xmlns:m="http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd" 
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd">
  <m:MessagePayload>
   <m:RTO>
    <m:name>CAISO</m:name> 
    <m:REPORT_ITEM>
     <m:REPORT_HEADER>
      <m:SYSTEM>OASIS</m:SYSTEM> 
      <m:TZ>PPT</m:TZ> 
      <m:REPORT>AS_RESULTS</m:REPORT> 
      <m:MKT_TYPE>HASP</m:MKT_TYPE> 
      <m:UOM>MW</m:UOM> 
      <m:INTERVAL>ENDING</m:INTERVAL> 
      <m:SEC_PER_INTERVAL>3600</m:SEC_PER_INTERVAL> 
     </m:REPORT_HEADER>
     <m:REPORT_DATA>
      <m:DATA_ITEM>NS_PROC_MW</m:DATA_ITEM> 
      <m:RESOURCE_NAME>AS_SP26_EXP</m:RESOURCE_NAME> 
      <m:OPR_DATE>2010-11-17</m:OPR_DATE> 
      <m:INTERVAL_NUM>1</m:INTERVAL_NUM> 
      <m:VALUE>0</m:VALUE> 
     </m:REPORT_DATA>

The problem is that the namespace "http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd" can sometimes be different. I want to ignore it completely and just get my data from tag MessagePayload downstream.

The code I am using so far is:

String[] namespaces = new String[1];
  String[] namespaceAliases = new String[1];

  namespaceAliases[0] = "ns0";
  namespaces[0] = "http://oasissta.caiso.com/mrtu-oasis/xsd/OASISReport.xsd";

  File inputFile = new File(inputFileName);

  Map namespaceURIs = new HashMap();

  // This query will return all of the ASR records.
  String xPathExpression = "/ns0:OASISReport
                             /ns0:MessagePayload
                              /ns0:RTO
                               /ns0:REPORT_ITEM
                                /ns0:REPORT_DATA";
  xPathExpression += "|/ns0:OASISReport
                        /ns0:MessagePayload
                         /ns0:RTO
                          /ns0:REPORT_ITEM
                           /ns0:REPORT_HEADER";

  // Load up the raw XML file. The parameters ignore whitespace and other
  // nonsense,
  // reduces DOM tree size.
  SAXReader reader = new SAXReader();
  reader.setStripWhitespaceText(true);
  reader.setMergeAdjacentText(true);
  Document inputDocument = reader.read(inputFile);

  // Relate the aliases with the namespaces
  if (namespaceAliases != null && namespaces != null)
  {
   for (int i = 0; i < namespaceAliases.length; i++)
   {
    namespaceURIs.put(namespaceAliases[i], namespaces[i]);
   }
  }

  // Cache the expression using the supplied namespaces.
  XPath xPath = DocumentHelper.createXPath(xPathExpression);
  xPath.setNamespaceURIs(namespaceURIs);

  List asResultsNodes = xPath.selectNodes(inputDocument.getRootElement());

It works fine if the namespace never changes but that is obviously not the case. What do I need to do to make it ignore the namespace? Or if I know the set of all possible namespace values, how can I pass them all to the XPath instance?

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
lukegf
  • 2,147
  • 3
  • 26
  • 39
  • 2
    @user452103: XPath is XML Names complain, so it will never ignore namespace. You can use **expression** that selects nodes regarding namespace. If namespace URI is changing so often, then is the wrong URI. **Namespace URI suppose to indicate that element belong to specific XML vocabulary**. –  Dec 09 '10 at 19:49
  • @user452103: Keep this formatting, it's more clear. –  Dec 09 '10 at 19:54
  • 1
    @Alejandro: thanks for the formatting, it does look better now. What expression can I use to select nodes regardless of namespace? – lukegf Dec 09 '10 at 20:15
  • Good question, +1. See my answer for a single XPath 1.0 expression that selects exactly the wanted nodes. :) – Dimitre Novatchev Dec 09 '10 at 20:32
  • Point of terminology... Re: title of question... XPath is not for *parsing* XML but for selecting nodes in an XML document tree that has already been parsed. Parsing is the process of reading a linear stream of characters, such as a file, and constructing a data structure, such as a node tree. – LarsH Dec 09 '10 at 23:08
  • @Alejandro has a good point: the namespace URI should not change unless you are changing to a different XML vocabulary... in which case, you will have to change your code anyway. So whoever made a design decision to change the namespace URI frequently needs some training about XML namespaces. – LarsH Dec 09 '10 at 23:20
  • 1
    https://stackoverflow.com/questions/4440451/how-to-ignore-namespaces-with-xpath – n611x007 Sep 21 '15 at 12:47
  • "Should" is the key word. But you are failing to take into account the concept of interfacing. Where one system is communicating with another system, same XML standard defined but separate entities. There is no XML authority that is going to tell company A to name it like company B. While the content may match, the namespace can be anything. – Chris Jan 26 '17 at 15:17
  • 2
    You could use Namespace = false on a XmlTextReader see: https://stackoverflow.com/a/49361232/9516092 – Pierre Vonderscher Mar 19 '18 at 11:04

2 Answers2

132

This is FAQ (but I'm lazy to search duplicates today)

In XPath 1.0

//*[local-name()='name']

Selects any element with "name" as local-name.

In XPath 2.0 you can use:

//*:name
  • Does the FAQ explain why the namespaces are not ignored by default which is what 95% of users want. How often do one really need namespaces for disambiguation? – frenchone Feb 15 '23 at 15:28
  • @frenchone Have never, not once, needed the namespace. – Zimano Jun 08 '23 at 11:40
45

Use:

/*/*/*/*/*
        [local-name()='REPORT_DATA' 
       or 
         local-name()='REPORT_HEADER'
        ]
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431