1

Currently I'm trying to use xpath to parse an html page from a website.

I need to get a result in the format:

Time of the program : Program name

For example:

1.00PM : Ye Hai Mohabbatein

I am using the following code (as shown here) to obtain it but it is not working.

<?php

libxml_use_internal_errors(true);
$dom = new DomDocument;
$dom->loadHTMLFile("www.starplus.in/schedule.aspx");
$xpath = new DomXPath($dom);
$nodes = $xpath->query("//table");
foreach ($nodes as $i => $node) {
echo "hy";
    echo "Node($i): ", $node->nodeValue, "\n";
}

?>

I will be thankful if anybody help me out in this issue.

Community
  • 1
  • 1
akshaivk
  • 427
  • 5
  • 24
  • In the future, when you write that something "is not working", please tell what it did and how that differs from what you wanted it to do. – LarsH Oct 09 '14 at 10:24
  • The main bug was the invalid URL, which was missing "http://" at the start. – Alf Eaton Oct 13 '14 at 09:14

1 Answers1

2

Basically, just target the table div/table which has that name of the show and the timeslot.

Rough example:

// it seems it doesn't work when there is no user agent
$ch = curl_init('http://www.starplus.in/schedule.aspx');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);

$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($page);
libxml_clear_errors();
$xpath = new DOMXPath($dom);

$shows = array();
$tables = $xpath->query("//div[@class='sech_div_bg']/table"); // target that table

foreach ($tables as $table) {
    $time_slot = $xpath->query('./tr[1]/td/span', $table)->item(0)->nodeValue;
    $show_name = $xpath->query('./tr[3]/td/span', $table)->item(0)->nodeValue;
    $shows[] = array('time_slot' => $time_slot, 'show_name' => $show_name);
    echo "$time_slot - $show_name <br/>";
}

// echo '<pre>';
// print_r($shows);
Kevin
  • 41,694
  • 12
  • 53
  • 70