3

Ok im trying to build an xml feed from this HTML table using PHP Simple HTML DOM Parser.

<table>
<tr><td colspan="5"><strong>Saturday October 15 2011</strong></td></tr>

<tr><td>Team 1</td>     <td>vs</td>     <td>Team 7</td> <td>3:00 pm</td></tr>
<tr><td>Team 2</td>     <td>vs</td>     <td>Team 12</td>    <td>3:00 pm</td></tr>
<tr><td>Team 3</td>     <td>vs</td>     <td>Team 8</td> <td>3:00 pm</td></tr>
<tr><td>Team 4</td>     <td>vs</td>     <td>Team 10</td>    <td>3:00 pm</td></tr>
<tr><td>Team 5</td>     <td>vs</td>     <td>Team 11</td>    <td>3:00 pm</td></tr>

<tr><td colspan="5"><strong>Monday October 17 2011</strong></td></tr>

<tr><td>Team 6</td>     <td>vs</td>     <td>Team 9</td> <td>7:45 pm</td></tr>

<tr><td colspan="5"><strong>Saturday October 22 2011</strong></td></tr>

<tr><td>Team 7</td>     <td>vs</td>     <td>Team 12</td>    <td>3:00 pm</td></tr>
<tr><td>Team 1</td>     <td>vs</td>     <td>Team 2</td> <td>3:00 pm</td></tr>
<tr><td>Team 8</td>     <td>vs</td>     <td>Team 4</td> <td>3:00 pm</td></tr>
<tr><td>Team 3</td>     <td>vs</td>     <td>Team 6</td> <td>3:00 pm</td></tr>
<tr><td>Team 9</td>     <td>vs</td>     <td>Team 5</td> <td>3:00 pm</td></td></tr>
<tr><td>Team 10</td>        <td>vs</td>     <td>Team 11</td>    <td>3:00 pm</td></tr>
</table>

What I am aiming to do is extract the Date and then the following rows up until the next date. so that I can build an XML node as such for each of the dates.

<matchday date="Saturday October 15 2011">
    <fixture>
        <hometeam>Team 1</hometeam>
        <awayteam>Team 7</awayteam>
        <kickoff>3:00 pm</kickoff>
    </fixture>
    <fixture>
        <hometeam>Team 2</hometeam>
        <awayteam>Team 12</awayteam>
        <kickoff>3:00 pm</kickoff>
    </fixture>
</matchday>

I have at present each of the dates from the html and built their respective xml nodes

$dateNodes = $html->find('table tr td[colspan="5"] strong');

foreach($dateNodes as $date){
    echo '<matchday day="'.trim($date->innertext).'">';
    // FIXTURES

    // END FIXTURES
    echo '</matchday>';
}

How would i go about getting the team names etc for each fixture up until the next matchday date?

ChrisMJ
  • 1,620
  • 4
  • 21
  • 27
  • You did not accept an answer yet. Can you please clarify what you are looking for in an answer and why the given answers do not satisfy you. – Gordon Dec 24 '11 at 09:55

1 Answers1

2

Instead if SimpleHtmlDom (which I believe is a craptaculous library), you can use an XSLT transformation and PHP's native XSLT processor:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" method="xml"/>
  <xsl:template match="/">
    <matchdays>
      <xsl:for-each select="table/tr[td[@colspan=5]]">
        <matchday>
          <xsl:attribute name="date">
            <xsl:value-of select="td/strong"/>
          </xsl:attribute>
          <xsl:for-each select="following-sibling::tr[
            not(td[@colspan]) and 
            preceding-sibling::tr[td[@colspan]][1] = current()
          ]">
            <fixture>
              <hometeam><xsl:value-of select="td[1]"/></hometeam>
              <awayteam><xsl:value-of select="td[3]"/></awayteam>
              <kickoff><xsl:value-of select="td[4]"/></kickoff>
            </fixture>
          </xsl:for-each>                   
        </matchday>
      </xsl:for-each>
    </matchdays>
  </xsl:template>   
</xsl:stylesheet>

Then just use the code given in the example at http://php.net/manual/en/xsltprocessor.transformtoxml.php to transform your HTML to the XML:

$xml = new DOMDocument;
$xml->load('YourSourceFile.xml');
$xsl = new DOMDocument;
$xsl->load('YourStyleSheet.xsl');
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
echo $proc->transformToXML($xml);

Demo at Codepad


In addition to using XSLT, you can also do it with PHP's native DOM extension:

$xml = new DOMDocument;
$xml->loadHtmlFile('YourHtmlFile.xml');
$xp = new DOMXPath($xml);   
$new = new DOMDocument('1,0', 'utf-8');
$new->appendChild($new->createElement('matchdays'));
foreach ($xp->query('//table/tr/td[@colspan=5]/strong') as $gameDate) {
    $matchDay = $new->createElement('matchday');
    $matchDay->setAttribute('date', $gameDate->nodeValue);
    foreach ($xp->query(
        sprintf(
            '//tr[
                not(td[@colspan]) and
                preceding-sibling::tr[td[@colspan]][1]/td/strong/text() = "%s"
            ]',
            $gameDate->nodeValue
        )
    ) as $gameData) {
        $tds = $gameData->getElementsByTagName('td');
        $fixture = $matchDay->appendChild($new->createElement('fixture'));
        $fixture->appendChild($new->createElement(
            'hometeam', $tds->item(0)->nodeValue)
        );
        $fixture->appendChild($new->createElement(
            'awayteam', $tds->item(2)->nodeValue)
        );
        $fixture->appendChild($new->createElement(
            'kickoff', $tds->item(3)->nodeValue)
        );
    }
    $new->documentElement->appendChild($matchDay);
}
$new->formatOutput = true;
echo $new->saveXML();

Demo at Codepad

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • Is it possible to load in an external url into the $xml->loadHtmlFile('YourHtmlFile.xml'); ie if i wanted to parse another website to retrieve its data – ChrisMJ Oct 14 '11 at 23:20
  • @ChrisMJ yes. please refer to the [manual for DOM for details](http://de2.php.net/manual/de/book.dom.php) – Gordon Oct 15 '11 at 07:36