I have a newsfeed link from an Indian newspaper as follows:
https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml
I am trying to extract some information from it using PHP and simpleXML
$feedURL="https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml";
$array = get_headers($feedURL);
$statusCode = $array[0];
echo('<br>'.$statusCode.'<br>');
if (strpos($statusCode, "404")==FALSE) {
echo('Reading <a href="' . $feedURL . '">' . $feedURL . '</a><br>');
$out = htmlspecialchars(file_get_contents($feedURL), ENT_QUOTES);
echo($out);
if (stripos($out, "<feed ") != FALSE) {
$feedType = 'ATOM';
$countATOM += 1;
} else if (stripos($out, "<rss") != FALSE) {
$feedType = 'RSS';
$countRSS += 1;
} else {
$feedType = 'UNREADABLE';
$countUNREADABLE += 1;
}
echo('<br>' . $feedType . '<br>');
echo('<br>-------------------------------------------------------------------------<br>');
if ($feedType == 'ATOM') {
$xmlOut = simplexml_load_string(file_get_contents($feedURL));
echo($xmlOut.'<br>-------------------------------------------------------------------------<br>');
if ($xmlOut === false) {
echo("Failed loading XML: ");
foreach (libxml_get_errors() as $error) {
echo ("<br>" . $error->message);
}
} else {
foreach ($xmlOut->entry as $entry) {
if (isset($xmlOut->entry->title) && isset($xmlOut->entry->link) && isset($xmlOut->entry->updated) && isset($xmlOut->entry->summary)){
$title=$xmlOut->entry->title;
$link=$title=$xmlOut->entry->link['href'];
$updated=$xmlOut->entry->updated;
$summary=$xmlOut->entry->summary;
if(isImportantNews($title) || isImportantNews($summary)){
$insertNewsCmd=$insertNewsCmd
."('".$link."',"
."'".stripSpecialChars($title)."',"
."'".setDate($updated)."'),";
}
}
echo($entry->updated . "<br>");
}
}
} elseif ($feedType == 'RSS') {
$xmlOut = simplexml_load_string(file_get_contents($feedURL));
print_r($xmlOut);
echo('<br>-------------------------------------------------------------------------<br>');
if ($xmlOut === false) {
echo("Failed loading XML: ");
foreach (libxml_get_errors() as $error) {
echo ("<br>" . $error->message);
}
} else {
foreach ($xmlOut->channel->item as $item) {
if (isset($item->title) && isset($item->link) && isset($item->description) && isset($item->pubDate)) {
$title = $item->title;
$link = $item->link;
$descr = $item->description;
$pubDate = $item->pubDate;
echo($title.'<br>'.$link.'<br>'.$descr.'<br>');
echo('<br>-------------------------------------------------------------------------<br>');
if(isImportantNews($title) || isImportantNews($descr)){
$insertNewsCmd=$insertNewsCmd
."('".$link."',"
."'".stripSpecialChars($title)."',"
."'".setDate($pubDate)."'),";
}
echo($entries->pubDate. "<br>");
}
}
}
} else {
continue;
}
break;
} else {
echo($feedURL . ' encountered problems being read...' . '<br>');
}
Basically what I am doing in the program is that I am using the above link (after determining if it is ATOM or RSS) to extract the news summary and description and determine if it is important news using the isImportantNews() method. If so, I store it in a database.
My problem is that if I open the above link in a browser directly, I can get to see the information without any issues but trying to read it using the above code returns a HTTP 403 Forbidden status code
Why is this happening and is there a way to get around this issue? Being able to open it directly tells me that the 403 maybe coming up due to programatic access attempt (?) But I am not certain about it. I also tried the following ways to read it with the same expected failure
echo('read file ####################################################################################################');
echo readfile("https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml"); //needs "Allow_url_include" enabled
echo('<br>include ####################################################################################################');
echo include("https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml"); //needs "Allow_url_include" enabled
echo('<br>file get contents ####################################################################################################');
echo file_get_contents("https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml");
echo('<br>stream get contents####################################################################################################');
echo stream_get_contents(fopen('https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml', "r")); //you may use "r" instead of "rb" //needs "Allow_url_fopen" enabled
echo('<br>get remote data ####################################################################################################');
echo get_remote_data('https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml');
$feedURL = "https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml";
$out = htmlspecialchars(file_get_contents($feedURL), ENT_QUOTES);
echo($out);
Any help or insight would be most appreciated.