4

I'm trying to do something which ought to be quite simple but I'm having terrible trouble. I have tried code from multiple similar questions in StackOverflow but to no avail. I'm trying to get various pieces of information from an ABN lookup with the Australian government. Here is anonymised return XML value:

    <?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <ABRSearchByABNResponse xmlns="http://abr.business.gov.au/ABRXMLSearch/">
            <ABRPayloadSearchResults>
                <request>
                    <identifierSearchRequest>
                        <authenticationGUID>00000000-0000-0000-0000-000000000000</authenticationGUID>
                        <identifierType>ABN</identifierType>
                        <identifierValue>00 000 000 000</identifierValue>
                        <history>N</history>
                    </identifierSearchRequest>
                </request>
                <response>
                    <usageStatement>The Registrar of the ABR monitors the quality of the information available on this website and updates the information regularly. However, neither the Registrar of the ABR nor the Commonwealth guarantee that the information available through this service (including search results) is accurate, up to date, complete or accept any liability arising from the use of or reliance upon this site.</usageStatement>
                    <dateRegisterLastUpdated>2017-01-01</dateRegisterLastUpdated>
                    <dateTimeRetrieved>2017-01-01T00:00:00.2016832+10:00</dateTimeRetrieved>
                    <businessEntity>
                        <recordLastUpdatedDate>2017-01-01</recordLastUpdatedDate>
                        <ABN>
                            <identifierValue>00000000000</identifierValue>
                            <isCurrentIndicator>Y</isCurrentIndicator>
                            <replacedFrom>0001-01-01</replacedFrom>
                        </ABN>
                        <entityStatus>
                            <entityStatusCode>Active</entityStatusCode>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                            <effectiveTo>0001-01-01</effectiveTo>
                        </entityStatus>
                        <ASICNumber>000000000</ASICNumber>
                        <entityType>
                            <entityTypeCode>PRV</entityTypeCode>
                            <entityDescription>Australian Private Company</entityDescription>
                        </entityType>
                        <goodsAndServicesTax>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                            <effectiveTo>0001-01-01</effectiveTo>
                        </goodsAndServicesTax>
                        <mainName>
                            <organisationName>COMPANY LTD</organisationName>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                        </mainName>
                        <mainBusinessPhysicalAddress>
                            <stateCode>NSW</stateCode>
                            <postcode>0000</postcode>
                            <effectiveFrom>2017-01-01</effectiveFrom>
                            <effectiveTo>0001-01-01</effectiveTo>
                        </mainBusinessPhysicalAddress>
                    </businessEntity>
                </response>
            </ABRPayloadSearchResults>
        </ABRSearchByABNResponse>
    </soap:Body>
</soap:Envelope>

so I want to get for example the whole response using xpath="//response" then use various xpath statement within that node to get the <organisationName> ("//mainName/organisationName") and other values. It should be simple right? Those xpath statements appear to work when testing in Notepad++but I use this code in Visual Studio:

XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(ipxml);
XmlNode xnode = xdoc.SelectSingleNode("//response");
XmlNodeList xlist = xdoc.SelectNodes("//mainName/organisationName");
xlist = xdoc.GetElementsByTagName("mainName");

But it always returns null, whatever I put in the xpath I get a null return for the node and 0 count for the list whether I'm selecting something with child nodes, a value or not. I can get the nodes using GetElementsByTagName() as in the example which returns the correct node, but I wanted to do it 'properly' selecting the proper field using xpath.

I also tried using XElement and Linq but still no luck. Is there something weird about the XML?

I'm sure it must something simple but I've been struggling for ages.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
Chris H
  • 125
  • 2
  • 11

2 Answers2

6

You aren't dealing with the namespaces present in the document. Specifically, the high level element:

<ABRSearchByABNResponse xmlns="http://abr.business.gov.au/ABRXMLSearch/">

places ABRSearchByABNResponse, and all its child elements (unless overridden by another xmlns) into the namespace http://abr.business.gov.au/ABRXMLSearch/. In order to navigate to these nodes (without hacks like GetElementsByTagName or using local-name()), you'll need to register the namespaces with an XmlNamespaceManager, like so. The xmlns aliases don't necessarily need to match those used in the original document, but it's a good convention to do so:

XmlDocument

var xdoc = new XmlDocument();
var ns = new XmlNamespaceManager(xdoc.NameTable);
ns.AddNamespace("soap", "http://schemas.xmlsoap.org/soap/envelope/");
ns.AddNamespace("abr", "http://abr.business.gov.au/ABRXMLSearch/");

xdoc.LoadXml(ipxml);
// NB need to use the overload accepting a namespace
var xresponse = xdoc.SelectSingleNode("//abr:response", ns);
var xlist = xdoc.SelectNodes("//abr:mainName/abr:organisationName", ns);

XDocument

More recently, the powers of LINQ can be harnessed with XDocument, which makes working with namespaces much easier (Descendants finds child nodes at any depth)

var xdoc = XDocument.Parse(ipxml);
XNamespace soap = "http://schemas.xmlsoap.org/soap/envelope/";
XNamespace abr = "http://abr.business.gov.au/ABRXMLSearch/";

var xresponse = xdoc.Descendants(abr + "response");
var xlist = xdoc.Descendants(abr + "organisationName");

XDocument + XPath

You can also resort to using XPath in Linq to Xml, especially for more complicated expressions:

var xdoc = XDocument.Parse(ipxml);
var ns = new XmlNamespaceManager(new NameTable());
ns.AddNamespace("soap", "http://schemas.xmlsoap.org/soap/envelope/");
ns.AddNamespace("abr", "http://abr.business.gov.au/ABRXMLSearch/");

var xresponse = xdoc.XPathSelectElement("//abr:response", ns);
var xlist = xdoc.XPathSelectElement("//abr:mainName/abr:organisationName", ns);
StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • By the way on a side note I notice you use 'var' rather than a particular type. Is that best practice or your preference? I don't do a lot of coding so will take any tips from those who do. – Chris H May 24 '18 at 14:14
  • `var` is strongly typed implicit typing, which pretty much all functional languages do. To use it or not is one of the [great religious debates](https://stackoverflow.com/questions/41479/use-of-var-keyword-in-c-sharp). Almost all code nowadays uses `var` - you can hover over the variable in the IDE to see the type. This shifts focus on giving the variable a decent name (which I haven't really done - blush) – StuartLC May 24 '18 at 14:22
  • I mean almost `all my code`. Not trying to bias you – StuartLC May 24 '18 at 15:06
1

You need to call SelectSingleNode and SelectNodes on the DocumentElement. You are calling them on the document itself.

For example:

XmlNode xnode = xdoc.DocumentElement.SelectSingleNode("//response");
Philip Smith
  • 2,741
  • 25
  • 32