1

I have project in php + mysql (over 2 000 000 rows). Please view this php code.

<?php 
            for($i=0;$i<20;$i++)
            {
                $start = rand(1,19980);
                $select_images_url_q = "SELECT * FROM photo_gen WHERE folder='$folder' LIMIT $start,2 ";
                $result_select = (mysql_query($select_images_url_q));
                while($row = mysql_fetch_array($result_select))
                    {
                    echo '<li class="col-lg-2 col-md-3 col-sm-3 col-xs-4" style="height:150px">
                                      <img class="img-responsive" src="http://static.gif.plus/'.$folder.'/'.$row['code'].'_s.gif">
                                </li>';
                }
            }
            ?>

This code work very slowly in $start = rand(1,19980); position, Please help how I can make select request with mysql random function, thank you

Vahagn
  • 105
  • 1
  • 12

3 Answers3

2

Depending on what your code is doing with $folder, you may be vulnerable to SQL injection.

For better security, consider moving to PDO or MySQLi and using prepared statements. I wrote a library called EasyDB to make it easier for developers to adopt better security practices.

The fast, sane, and efficient way to select N distinct random elements from a database is as follows:

  1. Get the number of rows that match your condition (i.e. WHERE folder = ?).
  2. Generate a random number between 0 and this number.
  3. Select a row with a given offset like you did.
  4. Store the ID of the previously generated row in an ever-growing list to exclude from the results, and decrement the number of rows.

An example that uses EasyDB is as follows:

// Connect to the database here:
$db = \ParagonIE\EasyDB\Factory::create(
    'mysql;host=localhost;dbname=something',
    'username',
    'putastrongpasswordhere'
); 

// Maintain an array of previous record IDs in $exclude
$exclude = array();
$count = $db->single('SELECT count(id) FROM photo_gen WHERE folder = ?', $folder);

// Select _up to_ 40 values. If we have less than 40 in the folder, stop
// when we've run out of photos to load:
$max = $count < 40 ? $count : 40;

// The loop:
for ($i = 0; $i < $max; ++$i) {
    // The maximum value will decrease each iteration, which makes
    // sense given that we are excluding one more result each time
    $r = mt_rand(0, ($count - $i - 1));

    // Dynamic query
    $qs = "SELECT * FROM photo_gen WHERE folder = ?";

    // We add AND id NOT IN (2,6,7,19, ...) to prevent duplicates:
    if ($i > 0) {
        $qs .= " AND id NOT IN (" . implode(', ', $exclude) . ")";
    }
    $qs .= "ORDER BY id ASC LIMIT ".$r.", 1";

    $row = $db->row($qs, $folder);

    /**
     * Now you can operate on $row here. Feel free to copy the
     * contents of your while($row=...) loop in place of this comment.
     */

    // Prevent duplicates
    $exclude []= (int) $row['id'];
}

Gordon's answer suggests using ORDER BY RAND(), which in general is a bad idea and can make your queries very slow. Furthermore, although he says that you shouldn't need to worry about there being less than 40 rows (presumably, because of the probability involved), this will fail in edge cases.

A quick note about mt_rand(): It's a biased and predictable random number generator with only 4 billion possible seeds. If you want better results, look into random_int() (PHP 7 only, but I'm working on a compatibility layer for PHP 5 projects. See the linked answer for more information.)

Community
  • 1
  • 1
Scott Arciszewski
  • 33,610
  • 16
  • 89
  • 206
  • 1
    . . Your statement about my answer is flat out incorrect. Running multiple queries is generally a bad idea. In this case, for 20 rows, it might or might not be efficient, depending on various factors related to the structure of the database and the connections to the database. – Gordon Linoff Jul 15 '15 at 11:11
1

Actually, even though the table has 2+ million rows, I'm guessing that a given folder has many fewer. Hence, this should be reasonable with an index on photo_gen(folder):

SELECT *
FROM photo_gen 
WHERE folder = '$folder'
ORDER BY rand()
LIMIT 40;

If a folder can still have tens or hundreds of thousands of examples, I would suggest a slight variation:

SELECT pg.**
FROM photo_gen pg cross join
     (select count(*) cnt from photo_gen where folder = $folder) as cnt
WHERE folder = '$folder' and
      rand() < 500 / cnt
ORDER BY rand()
LIMIT 40;

The WHERE expression should get about 500 rows (subject to the vagaries of sample variation). There is a really high confidence that there will be at least 40 (you don't need to worry about it). The final sort should be fast.

There are definitely other methods, but they are complicated by the where clause. The index is probably the key thing you need for improved performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • http://www.webtrenches.com/post.cfm/avoid-rand-in-mysql - with 2+ million records, `ORDER BY RAND()` seems like a bad idea. – Scott Arciszewski Jul 14 '15 at 20:09
  • that's interesting for me your answer has -1 vode-down – Shafizadeh Jul 14 '15 at 21:10
  • @ScottArciszewski . . . The point is that the query does NOT sort all the data. It takes a few hundred rows and sorts that. Sorting a few hundred rows is generally fine, for t his type of problem. You are clearly misunderstanding the query logic. – Gordon Linoff Jul 15 '15 at 11:08
  • What do you think of the solution I posted? I'm a PHP guy, not a MySQL DBA, so there might be nuances I'm unaware of. – Scott Arciszewski Jul 15 '15 at 11:10
0

It's better to firstly compose your SQL query (as a string in PHP) once and then just execute it once.

Or you could use this way to select values if it fits your case: Select n random rows from SQL Server table

Community
  • 1
  • 1
Vlad DX
  • 4,200
  • 19
  • 28