1

I have an RSS xml file that is pretty large, with more than 700 nodes. I am using XMLReader Iterator library to parse it and display the results as 10 per page.

This is my sample code for parsing xml:

<?php
require('xmlreader-iterators.php');

$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);

$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();

foreach ($itemIterator as $item) {
    $xml     = $item->asSimpleXML();
    $items[] = array(
        'title'     => (string)$xml->title,
        'link'      => (string)$xml->link
    );
}

// Logic for displaying the array values, based on the current page. 
// page = 1 means $items[0] to $items[9]

for($i = 0; $i <= 9; $i++)
{       
    echo '<a href="'.$items[$i]['link'].'">'.$items[$i]['title'].'</a><br>';      
}
?>

But the problem is that, for every page, i am parsing the entire xml file and then just displaying the corresponding page results, like: if the page is 1, displaying the 1 to 10 nodes, and if the page is 5, displaying 41 to 50 nodes.

It is causing delay in displaying data. Is it possible to read just the nodes corresponding to the requested page? So for the first page, i can read nodes from 1 to 10 positions, instead of parsing all the xml file and then display first 10 nodes. In other words, can i apply a limit while parsing an xml file?

I came across this answer of Gordon that addresses a similar question, but it is using SimpleXML, which is not recommended for parsing large xml files.

Community
  • 1
  • 1
shasi kanth
  • 6,987
  • 24
  • 106
  • 158
  • give your xml file URL – Padmanathan J Sep 05 '13 at 10:58
  • This is my actual XML file: http://oar.icrisat.org/cgi/exportview/subjects/s1=2E2/RSS2/s1=2E2.xml, which is almost similar in structure to the Yahoo feed URL: http://sports.yahoo.com/mlb/teams/bos/rss.xml – shasi kanth Sep 10 '13 at 06:32
  • Thanks for all your answers. I feel that i need to increase the values of max_execution_time and memory_limit on my Zend's Apache server. – shasi kanth Sep 15 '13 at 16:23

4 Answers4

2

use array_splice to extract the portion of array

require ('xmlreader-iterators.php');

$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);

$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();

$curr_page = (0 === (int) $_GET['page']) ? 1 : $_GET['page'];

$pages = 0;

$max = 10;

foreach ($itemIterator as $item) {
   $xml = $item->asSimpleXML();
   $items[] = array(
       'title' => (string) $xml->title,
       'link' => (string) $xml->link
  );
}

// Take the length of the array
$len = count($items);

// Get the number of pages
 $pages = ceil($len / $max);

// Calculate the starting point
$start = ceil(($curr_page - 1) * $max);

// return the portion of results
$arrayItem = array_slice($items, $start, $max);

for ($i = 0; $i <= 9; $i ++) {
    echo '<a href="' . $arrayItem[$i]['link'] . '">' . $arrayItem[$i]['title'] . '</a><br>';
 }

 // pagining stuff

 for ($i = 1; $i <= $pages; $i ++) {

   if ($i === (int) $page) {
       // current page

       $str[] = sprintf('<span style="color:red">%d</span>', $i);
   } else {

      $str[] = sprintf('<a href="?page=%d" style="color:green">%d</a>', $i, $i);
  }
}
  echo implode('', $str);
shasi kanth
  • 6,987
  • 24
  • 106
  • 158
Shushant
  • 1,625
  • 1
  • 13
  • 23
1

Use cache in this case, since you cannot parse partially an XML.

itscaro
  • 105
  • 10
1

Check this

<?php
if($_GET['page']!=""){
    $startPagenew = $_GET['page'];
    $startPage = $startPagenew-1;
}
else{
      $startPage = 0;
    }
    $perPage = 10;
    $currentRecord = 0;
    $xml = new SimpleXMLElement('http://sports.yahoo.com/mlb/teams/bos/rss.xml', 0, true);

    echo $startPage * $perPage;
      foreach($xml->channel->item as $key => $value)
        {
         $currentRecord += 1;

         if($currentRecord > ($startPage * $perPage) && $currentRecord < ($startPage * $perPage + $perPage)){

        echo "<a href=\"$value->link\">$value->title</a>";    

        echo "<br>";

        }
        }
//and the pagination:
//echo $currentRecord;
        for ($i = 1; $i <= ($currentRecord / $perPage); $i++) {
           echo("<a href='xmlpagination.php?page=".$i."'>".$i."</a>");
        } ?>

Updated

Check this Link

http://www.phpclasses.org/package/5667-PHP-Parse-XML-documents-and-return-arrays-of-elements.html

Padmanathan J
  • 4,614
  • 5
  • 37
  • 75
  • This is working for an xml file that contains some 200 nodes. But if i try this code with a large xml file (as is the requirement), i am getting an internal server error. – shasi kanth Sep 08 '13 at 15:01
1

You can use Dom and Xpath. It should be much faster, since Xpath allows you to select nodes by their position in a list.

<?php  
$string = file_get_contents("http://oar.icrisat.org/cgi/exportview/subjects/s1=2E2/RSS2/s1=2E2.xml");


$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXML($string); 
$string = "";

$xpath = new DOMXPath($dom);

$channel = $dom->getElementsByTagName('channel')->item(0);

$numItems = $xpath->evaluate("count(item)", $channel); 
// get your paging logic

$start = 10;
$end = 20;

$items = $xpath->evaluate("item[position() >= $start and not(position() > $end)]", $channel);
$count = $start;
foreach($items as $item) {
    print_r("\r\n_____Node number $count ");
    print_r( $item->nodeName);
    $childNodes = $item->childNodes;
    foreach($childNodes as $childNode) { 
        print_r($childNode->nodeValue);
    }
    $count ++;
}
Paolo Mioni
  • 1,008
  • 10
  • 17
  • I tried this code. For a large xml file (700 items), even this gives an internal server error. And there should be a **break;** after the last print_r statement. Else the items are printed multiple times. – shasi kanth Sep 09 '13 at 15:20
  • What sort of Internal Server Error did you get? Have you got the error code? – Paolo Mioni Sep 09 '13 at 15:32
  • The print_r is there just to show you what can be done with the various nodes. The results depends on the actual content feed. – Paolo Mioni Sep 09 '13 at 15:35
  • You can view the actual execution of your code, by parsing this large xml file: http://oar.icrisat.org/cgi/exportview/subjects/s1=2E2/RSS2/s1=2E2.xml – shasi kanth Sep 09 '13 at 17:42
  • 1
    It works for me here. If you get an error, you should check your web server's error log. Your script either throws an out of memory error, because the file is too large, or a maximum execution time error, because the feed takes a long time to be downloaded. I've edited the code so that after loading the feed into the XML Parser it frees memory by resetting the $string variable. I've also fixed the print_r and it works fine now. – Paolo Mioni Sep 10 '13 at 08:09
  • 1
    If you need to parse a long RSS feed on a remote server, it is advisable to cache it locally so that you don't have to fetch it for every time you need to output a new page. – Paolo Mioni Sep 10 '13 at 08:12
  • **max_execution_time** was the culprit in this case. It was just 30 seconds in my Zend Apache's php.ini file. Increasing it to 120 seconds did the job. Thanks for the idea of caching the xml feed. It is always recommended for large xml files. Awarding the bounty to this answer for its priority. – shasi kanth Sep 15 '13 at 16:18