0

I'm in college and new to PHP regular expressions but I have somewhat of an idea what I need to do I think. Basically I need to create a PHP program to read XML source code containing several 'stories' and store their details in a mySQL database. I've managed to create an expression that selects each story but I need to break this expression down further in order to get each element within the story. Here's the XML:

XML

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<latestIssue>

    <issue number="256" />

    <date>
        <day> 21 </day>
        <month> 1 </month>
        <year> 2011 </year>
    </date>

    <story>
        <title> Is the earth flat? </title>
        <author> A. N. Redneck </author>
        <url> http://www.HotStuff.ie/stories/story123456.xml </url>
    </story>

    <story>
        <title> What the actress said to the bishop </title>
        <author> Brated Film Critic </author>
        <url> http://www.HotStuff.ie/stories/story123457.xml </url>
    </story>

    <story>
        <title> What the year has in store </title>
        <author> Stargazer </author>
        <url> http://www.HotStuff.ie/stories/story123458.xml </url>
    </story>

</latestIssue>

So I need to get the title, author and url from each story and add them as a row in my database. Here's what I have so far:

PHP

<?php
    $url = fopen("http://address/to/test.xml", "r");
    $contents = fread($url,10000000);

    $exp = preg_match_all("/<title>(.+?)<\/url>/s", $contents, $matches);

    foreach($matches[1] as $match) {

        // NO IDEA WHAT TO DO FROM HERE
        // $exp2 = "/<title>(.+?)<\/title><author>(.+?)<\/author><url>(.+?)<\/url>/";
        // This is what I had but I'm not sure if it's right or what to do after

    }
?>

I'd really appreciate the help guys, I've been stuck on this all day and I can't wrap my head around regular expressions at all. Once I've managed to get each story's details I can easily update the database.

EDIT: Thanks for replying but are you sure this can't be done with regular expressions? It's just the question says "Use regular expressions to analyse the XML and extract the relevant data that you need. Note that information about each story is spread across several lines of XML". Maybe he made a mistake but I don't see why he'd write it like that if it can't be done this way.

  • 3
    Regular Expressions are the [*wrong tool*](http://stackoverflow.com/q/1732348) here. You want to use an XML parser. – gen_Eric Jan 05 '14 at 22:25
  • It's possible with regular expressions, but then again it's also possible to fasten a screw with a sledgehammer. Just because you *can* doesn't mean it's the correct method. I'd say that the assignment is incorrect, or it's trying to make you work harder than you need to. – gen_Eric Jan 05 '14 at 23:32

3 Answers3

0

First of all, start using

file_get_contents("UrlHere");

to gather the content from a page.

Now if you want to parse the XML use the XML parser in PHP for example.

You could also use third-party XML parsers

Jeroen Ketelaar
  • 102
  • 1
  • 6
0

Regular expressions are not the correct tool to use here. You want to use a XML parser. I like PHP's SimpleXML

$sXML = new SimpleXMLElement('http://address/to/test.xml', 0, TRUE);
$stories = $sXML->story;
foreach($stories as $story){
    $title = (string)$story->title;
    $author = (string)$story->author;
    $url = (string)$story->url;
}
gen_Eric
  • 223,194
  • 41
  • 299
  • 337
0

You should never use regexp to parse an XML document (Ok, never is a big word, in some rare cases the regexp can be better but not in your case).

As it's a document reading, I suggest you to use the SimpleXML class and XPath queries. For example :

$ cat test.php 
#!/usr/bin/php
<?php
    function xpathValueToString(SimpleXMLElement $xml, $xpath){
        $arrayXpath = $xml->xpath($xpath);
        return ($arrayXpath) ? trim((string) $arrayXpath[0]) : null;
    }

    $xml = new SimpleXMLElement(file_get_contents("test.xml"));
    $arrayXpathStories = $xml->xpath("/latestIssue/story");

    foreach ($arrayXpathStories as $story){
        echo "Title : " . xpathValueToString($story, 'title') . "\n";
        echo "Author : " . xpathValueToString($story, 'author') . "\n";
        echo "URL : " . xpathValueToString($story, 'url') . "\n\n"; 
    }
?>
$ ./test.php 
Title : Is the earth flat?
Author : A. N. Redneck
URL : http://www.HotStuff.ie/stories/story123456.xml

Title : What the actress said to the bishop
Author : Brated Film Critic
URL : http://www.HotStuff.ie/stories/story123457.xml

Title : What the year has in store
Author : Stargazer
URL : http://www.HotStuff.ie/stories/story123458.xml
Idriss Neumann
  • 3,760
  • 2
  • 23
  • 32