How to parse/extract url from an xml file?

Question

I have an XML file that contains the following type of data

<definition name="/products/phone" path="/main/something.jsp" > </definition>

There are dozens of nodes in the xml file.

What I want to do is extract the url under the 'name' parameter so my end result will be:

http://www.mysite.com/products/phone.jsp

Can I do this with a so called XML parser? I have no idea where to begin. Can someone steer me to a direction. What tools do I need to achieve something like that?

I am particularly interested in doing this with PHP.

score 1 · Answer 1 · edited May 23 '17 at 12:14

It should be easy to append a path to an existing URL and expected resource type given the above basic XML.

If you are comfortable with C#, and you know there is one and only one "definition" element, here is a self contained little program that does what you require (and assumes you are loading the XML from a string):

using System;
using System.Xml;

public class parseXml
{
    private const string myDomain = "http://www.mysite.com/";
    private const string myExtension = ".jsp";

    public static void Main()
    {
        string xmlString = "<definition name='/products/phone' path='/main/something.jsp'> </definition>";

        XmlDocument doc = new XmlDocument();

        doc.LoadXml(xmlString);

        string fqdn =   myDomain +
                        doc.DocumentElement.SelectSingleNode("//definition").Attributes["name"].ToString() +
                        myExtension;

        Console.WriteLine("Original XML: {0}\nResultant FQDN: {1}", xmlString, fqdn);
    }
}

You are going to need to be careful with SelectSingleNode above; the XPath expression assumes there is only one "definition" node and that you are searching from the document root.

Fundamentally, it's worthwhile to read a primer on XML. Xml is not difficult, it's a self describing hierarchical data format - lots of nested text, angle brackets, and quotation marks :).

A good primer would probably be that at the W3 Schools: http://www.w3schools.com/xml/xml_whatis.asp

You may also want to read up on streaming (SAX/StreamReader) vs. loading (DOM/XmlDocument) Xml: What is the difference between SAX and DOM?

I can provide a Java example too, if you feel that would be helpful.

This is very helpful for me to understand the logic. Much appreciated! Can you provide an example in PHP? The xml document contains dozens of "definition" nodes so I guess a for loop would be needed yes? Something like: `definition as $definition) { echo ..... } ?>` — Obi-Wan, Aug 30 '13 at 14:06
Unfortunately I don't know PHP :(, but the logic will work like this: (1) Read the XML string/file into an XmlDocument object. (2) If you want a collection of all "definition" elements beneath the Xml root node, the above XPath expression will still work ("//definition") - so apply that XPath expression to the Xml Document to return a collection of "definition" elements. (3) Once you have the collection of "definition" elements, iterate through them using a foreach loop (as you do above) and construct your resultant FQDN's. - Does this help? — AFKAP, Aug 30 '13 at 23:13

score 0 · Answer 2 · answered Aug 25 '15 at 22:19

Not sure if you solved your problem, so here is a PHP solution:

$xml = <<<DATA
<?xml version="1.0"?>
<root>
<definition name="/products/phone" path="/main/something.jsp"> </definition>
<definition name="/products/cell" path="/main/something.jsp"> </definition>
<definition name="/products/mobile" path="/main/something.jsp"> </definition>
</root>
DATA;

$arr = array();
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($xml);

$xpath = new DOMXPath($dom);
$defs = $xpath->query('//definition');

foreach($defs as $def) { 
   $attr = $def->getAttribute('name');
   if ($attr != "") {
      array_push($arr, $attr);
   }
}
print_r($arr);

See IDEONE demo

Result:

Array
(
    [0] => /products/phone
    [1] => /products/cell
    [2] => /products/mobile
)

How to parse/extract url from an xml file?

2 Answers2