0

My problem is pretty much self explanatory but I cant quite work it out to make it as efficient as possible.
I want to select a random entry from a MySQL database. I want it to be as fast as possible and as efficient as possible (that's always the goal, isn't it?). When I select that row I want to select another row, but not the same as the one before. If I select 10 rows I want the 11th row to be different from all others (lets say unique :) ). But when I run out of rows I want to "report an error".

To get to the heart of the problem. I am using PHP with MySQL. I have an input array containing titles which have already been selected. Then I get the count of all items in the database so I know how many times can I "loop through max". Lets paste the code to see what we're dealing with here.

try
{
    $db = new PDO("mysql:host=localhost;dbname=xxxxx;charset=utf8", "xxxx", "xxxx");

    $played = explode(":;:", $_POST['items']); //All already selected items are in $_POST separated by :;:

    $sql = "SELECT count(id) AS count FROM table"; //Lets get the total count of items

    $query = $db->prepare($sql);
    $query->execute();
    $result = $query->fetch(PDO::FETCH_ASSOC);

    $count = $result['count']; //There we are...
    $i = 0; //Index counter so we dont exceed records.. well kinda (more on that below)

    do //do while because we want something to be selected first
    {
        $sql = "SELECT FLOOR(RAND() * COUNT(*)) AS offset FROM table"; //From here

        $query = $db->prepare($sql);
        $query->execute();
        $result = $query->fetch(PDO::FETCH_ASSOC);
        $offset = $result['offset'];

        $sql = "SELECT itemXML FROM table LIMIT $offset, 1";

        $query = $db->prepare($sql);
        $query->execute();
        $result = $query->fetch(PDO::FETCH_ASSOC); //To here is some code to randomly select a record "as efficiently as possible"..

        $output = Array();

        $xml = simplexml_load_string($result['itemXML']);

        $i++;
    } while (in_array($xml->{"title"}, $played) && $i < $count); //While record title is in array and while we have not exceeded the total number of records (that's the logic but it isint that simple is it?)

    if ($i >= $count)
    {
        die("400"); //Just a random status code which I parse with the client.
    }

    $itemArr = Array("whatever" => $xml->{"whatever-attr"}, "title" => $xml->{"title"});
    array_push($output, $itemArr); Lets push this to out array

    echo json_encode($output); //Aaaaand finally lets print out the results
}
catch (Exception $e) //If anything went wrong lets notify our client so he can respond properly
{
    $output = Array("error" => $e->getMessage());
    die(json_encode($output));
}

Yes well.. The problem is that WHAT IF there are 10 records, 9 rows have been selected and the index counter $i gets bigger or equal 10 and random records are all in the array. Then we have one row that should have been selected but its not.

And how do I fix this? Your help will be much appreciated!
If I didnt explain it well enough let me know an I will try harder.

Majster
  • 3,611
  • 5
  • 38
  • 60
  • 1
    Why don't you select all the records that you need at once and order these randomly? – jeroen Mar 07 '13 at 17:25
  • Selecting random data from mysql will never be fast or efficient since the "randomness" invalidates using any indexes, and generally makes you re-sort the entire data set every time you query. You should figure out something that *seems* random, but actually is not. – Sammitch Mar 07 '13 at 17:28
  • Because what happens when there are 100k records in there? A huge waste of time and space in my opinion. – Majster Mar 07 '13 at 17:28
  • @Sammitch I like your thought but I dont know how to implement what you're applying to. Perhaps try to give me some examples of some sort, so I have something to work with. – Majster Mar 07 '13 at 17:33
  • If you are going to select 100k records, selecting them all at once is less waste of time than selecting them one by one. What exactly are you trying to do? – jeroen Mar 07 '13 at 17:34
  • Well, to put it simply I want to make a "shuffle" thingie. I want to select a random record but not one I have previously selected. Client(s) send me titles which they already have and I need to provide them with new ones or respond with an error if there are none left. – Majster Mar 07 '13 at 17:37

2 Answers2

2

I think you are taking the wrong approach here. You should not need to loop through the database querying one record at a time.

If you need to select 10 records, just select 10 records ordered by RAND() like this

SELECT * FROM table
ORDER BY RAND()
LIMIT 10;

Or if you have certain ID's that you want to omit from the selection

SELECT * FROM table
WHERE id NOT IN (?, ?, ?, ...)
ORDER BY RAND()
LIMIT 10;

Or if the id's you want to omit are stored in another table

SELECT * FROM table
LEFT OUTER JOIN omit_table ON table.id = omit_table.id
WHERE omit_table.id IS NULL
ORDER BY RAND()
LIMIT 10;
Mike Brant
  • 70,514
  • 10
  • 99
  • 103
  • 10 records was a random number I picked to demonstrate. Afterward there will be n records (n € N - I dont know how many). So im not sure if selecting all records Is a good idea. – Majster Mar 07 '13 at 17:30
  • Omitting is a nice though but this is a part of PHP to which random clients send requests via AJAX. So I cant keep track of each and every one. – Majster Mar 07 '13 at 17:31
  • @Majster If you don't have a way to keep track of what records have been displayed to each client, I don't see how querying one row at a time is going to solve that problem of knowing which records should not be shown to that client. You will always need to have client-specific state (stored in session, cookie, HTML localStorage, etc. that can somehow be called upon to filter your query results). I don't follow your first comment as to why selecting all records at once is not a good idea. To me it is a much better idea than making 10, 20, 100, 1000 or whatever individual queries to the DB. – Mike Brant Mar 07 '13 at 17:38
  • The "forbidden" titles are in the $played variable. I know what not to send back. If I remove the index counter everything works just fine the problem is only that if all the records have been selected this falls into a endless loop. And since all the records have a large XML string inside selecting them all could take some time. Odds are that this will work just fine with 1000+ records because users will probably get tired after lets say 50 records. But I want to know how to solve this like a programmer should. – Majster Mar 07 '13 at 17:45
1

Let's say you have the following table filled with data already:

TABLE mydata
  id INT AUTOINCREMENT PRIMARYKEY
  name VARCAHAR
  ...

And we create the following table for some not-really-random mapping:

TABLE shufflemap
  id INT AUTOINCREMENT PRIMARYKEY
  data_id INT UNIQUEINDEX

And we do the following:

$rs = $dbh->query('SELECT id FROM mydata');
shuffle($rs);
foreach($rs as $data_id) {
    $dbh->query('INSERT INTO shufflemap (data_id) VALUES (?)', array($data_id));
}

Now what if we want to add rows? You can either TRUNCATE the table and re-run the above code, or:

$my_new_id = 1234; //the ID of the new row inserted into `mydata`
$rs = $dbh->query('SELECT COUNT(*) AS 'count' from shufflemap');
$target = rand(0,$rs[0]['count']);
$rs = $dbh->query('SELECT id, data_id FROM shufflemap LIMIT ?,1', array($target));
$swap_id = $rs[0]['id'];
$swap_data_id = $rs[0]['data_id'];
$dbh->query('UPDATE shufflemap SET data_id = ? WHERE id = ?', array($my_new_id, $swap_id));
$dbh->query('INSERT INTO shufflemap (data_id) VALUES (?)', array($swap_data_id));

Which picks on random entry from the shufflemap table in a reasonably efficient manner, replaces the data_id with the new one, and tacks the old one onto the end of the table.

Using this manner you can have your seemingly-random data with no repetitions, and still make use of all the proper indexes in your table by using the shufflemap table in JOINs, subqueries, or whatever else you can come up with.

edit

Let's say that the mydata table has a field indicating which client or user each field is associated with, ie:

TABLE mydata
  id INT AUTOINCREMENT PRIMARYKEY
  client_id INT
  name VARCAHAR
  ...

The shuffled listing of only that client's data can be retrieved by:

SELECT d.*
FROM mydata d INNER JOIN shufflemap s
  ON d.id = s.data_id
WHERE client_id = ?
ORDER BY s.id

Excluding a list of already-played items?

SELECT d.*
FROM mydata d INNER JOIN shufflemap s
  ON d.id = s.data_id
WHERE client_id = ?
  AND d.id NOT IN(?,?,?,...)
ORDER BY s.id
Sammitch
  • 30,782
  • 7
  • 50
  • 77
  • Yes, well I find this approach very interesting! But if I am not mistaken this only works for one client or for two clients with two shufflemap tables. But how about n users? I will be having n users firing requests at the script I have pasted above. – Majster Mar 07 '13 at 17:56
  • @Majster Why would each user or client require a uniquely-random listing? Will they be comparing notes? Also, are all of your records not in a single table differentiated by something like a client_id field? If not, you've improperly normalized your data. I've update my answer with example queries. – Sammitch Mar 07 '13 at 18:09
  • Jolly good! :D I have to say, thats a nice approach. Will need to tune it a bit to be of perfect service but it is definitely the solution im after. Thanks! – Majster Mar 07 '13 at 18:26