1

I've been working on getting an RSS feed setup and with the help of some people here I got it done. However, I really need some advice from some of you more experienced coders to show me what needs to be changed to keep the same functionality but speed up the page load.

It looks 30 RSS items and gets the URLs from my MySQL database. The problem is that it randomly selects 30 rows out of over 100 million rows in that table. That is what it's supposed to do, but with their being so many rows in the table, it's really slowing down the script and I need help!

<?php header("Content-type: text/xml"); ?>
<?php echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"; ?>
<?php include('directory/database.php'); ?>
<rss version="2.0">
<channel>
  <title>Website Reviews</title>
  <link>http://www.mywebsite.com</link>
  <description>Professional Services</description>
  <pubDate><?echo date('Y/m/d H:i:s');?></pubDate>

<?php
    foreach( range( 1, 30 ) as $i ):
$number = mt_rand( 1, 141754641 );
$query="SELECT * FROM `list` LIMIT $number , 1";
$result = mysql_query($query);
if($result == false)
{
   user_error("Query failed: " . mysql_error() . "<br />\n$query");
}
elseif(mysql_num_rows($result) == 0)
{
   echo "<p>Sorry, we're updating this section of our website right now!</p>\n";
}
else
{
   while($query_row = mysql_fetch_assoc($result))
   {
      foreach($query_row as $key => $domain)
      {
         echo "$value";
      }
   }
}  
?>
<item>
    <title><?php echo $domain; ?> REVIEW</title>
    <pubDate><?echo date('Y/m/d H:i:s');?></pubDate>
    <link>http://www.mywebsite.com/review/<?php echo $domain; ?></link>
    <description>Looking for a review on <?php echo $domain; ?>?  We've got it!</description>
</item>
<?php endforeach; ?>

</channel>
</rss>

Thanks in advance for any help that anyone can give!

  • How long does it take for the query to execute? – JP_ Nov 25 '12 at 06:31
  • I guess selecting ORDER BY RAND() LIMIT 30 is too slow? – Popnoodles Nov 25 '12 at 06:31
  • I've tested it a few times and every time was over 5 minutes, generally over 10 minutes though. Since it's an RSS feed obviously it needs to load much faster, I just can't figure out how to optimize it so that it gets that info from the database faster. – user1850695 Nov 25 '12 at 06:32
  • I've tried `ORDER BY RAND() LIMIT 30` and it didn't seem to speed it up at all either :/ – user1850695 Nov 25 '12 at 06:36
  • 1
    @popnoodles: `ORDER BY RAND()` is a very bad idea. See [alternatives to `ORDER BY RAND()`](http://stackoverflow.com/questions/1823306/mysql-alternatives-to-order-by-rand). – DCoder Nov 25 '12 at 06:37
  • I thought RAND() might be a bad idea. How long does it take to SELECT * FROM `list` LIMIT 0,30? – Popnoodles Nov 25 '12 at 06:39

3 Answers3

1

You can still select all 30 at once. It shouldn't be that slow to get 30 records.

$numbers=array();
foreach( range( 1, 30 ) as $i ):
    $numbers[] = mt_rand( 1, 141754641 );
endforeach;

$query="SELECT * FROM `list` WHERE `whatever_primary_key_is` IN (".implode(',', $numbers).")";
Popnoodles
  • 28,090
  • 2
  • 45
  • 53
1

Limit has to do table scans, so what you want to do is use indexes to your advantage. So first, let's add an autoincrement ID field to the table named "id".

Then,

<?php
$result = array();
$maxRow = mysql_fetch_assoc(mysql_query("SHOW TABLE STATUS LIKE 'list';"));
$max = $maxRow["Auto_increment"];
$minRow = mysql_fetch_assoc(mysql_query("SELECT id FROM 'list' LIMIT 1;"));
$min = $minRow["id"];
while (count($result) < 30) {
    $ids = array();
    while (count($ids) < 100) {
        $id = mt_rand($min, $max);
        $ids[$id] = 1;
    }
    $res = mysql_query("SELECT * from 'list' WHERE id IN (" . join(',', array_keys($ids)) . ") LIMIT 30");
    while (($row = mysql_fetch_assoc($result)) && (count($result) < 30)) {
        $result[] = array( ... ); // stuff results here
    }
}

// output
?>
Sajid
  • 4,381
  • 20
  • 14
  • Side note: random IDs may not get you a full 30 rows, so here I check and get some more if needed. – Sajid Nov 25 '12 at 06:53
1

May i suggest a more robust approach.

  1. You have to take in account that the number you go look for may not exist
  2. Only using one MySql query is often faster

Base on url1 and url2

You can have this php code instead :

<?php
// Connecting, selecting database
   $link = mysql_connect('mysql_host', 'mysql_user', 'mysql_password') or die('Could not connect: ' . mysql_error());
   mysql_select_db('my_database') or die('Could not select database');

$query = "SELECT `url`
          FROM `liste`
          ORDER BY RAND()
         LIMIT 30" ;

$result = mysql_query($query);

if($result == false)
{
   user_error("Query failed: " . mysql_error() . "<br />\n$query");
}
elseif(mysql_num_rows($result) == 0)
{
   echo "<p>Sorry, we're updating this section of our website right now!</p>\n";
}
else
{  $query_row = array();
   while($query_row = mysql_fetch_assoc($result))
    { 
        echo $query_row['url']; // no need to do the extra foreach
     }
 }

?>

The values mysql_host,mysql_user,mysql_password,my_database should be replaced by your connection.

As long as you have 30 row in your title table your are ok.

Jeffrey Nicholson Carré
  • 2,950
  • 1
  • 26
  • 44