1

I have searched all over for an answer and although people say not to use the ORDER BY RAND() clause, I think for my purposes it is ok as this is for a competition which barely has more than a few hundred records at a time PER competition.

So basically i need to retrieve 5 random records from a competition entries table. However any loyalty customers will received an additional EXTRA entry so example:

compEntryid | firstName | lastName | compID |
1           |  bob      |  smith   | 100
2           |  bob      | smith    | 100
3           |  jane     | doe      | 100
4           |  sam      | citizen  | 100

etc

So we are giving the loyalty members a better chance at winning a prize. However im a little worried that the returned result from a usual ORDER BY RAND() can include 2 entries of the SAME person ? What is an optimised method to ensure that we truly have 5 random records but at the same time giving those extra entrants a better or (weighted) chance ? Happy to use multiple queries, sub-queries or even a mix of MySQL and PHP ? Any advice is deeply appreciated thank you !

Bass

EDIT:

These 2 queries both work!

query1

  SELECT concat(firstName, " ", lastName) name,id, email 
    FROM t WHERE 
      RAND()<(SELECT ((5/COUNT(id))*10) FROM t) 
      group by email ORDER BY RAND()  limit 5;

query2

 select distinct 
    email, id, firstName, lastName from 
    (
    select id ,
    email, firstName , lastName , compID, rand()/(select count(*) from t where 
                                             email=t1.email 
                                             ) as rank
    from t t1 
    where compID = 100 
    order by rank) t2 limit 5;

http://sqlfiddle.com/#!2/73470c/2

Bass
  • 11
  • 4
  • You could make use of [`DISTINCT()`](http://www.mysqltutorial.org/mysql-distinct.aspx) – Funk Forty Niner Jun 05 '14 at 12:53
  • What would the query look like with distinct ? also in my schema there may be holes between ID's however they are sequential no matter what in the ID's ? – Bass Jun 05 '14 at 12:59
  • I do not believe this is a duplicate. The posted response giving a link is for a different (and far simpler) problem to just get random records. This question wants random records without duplicates but biasing the random records returned by the number of duplicates for an entry. – Kickstart Jun 05 '14 at 14:32
  • 1
    Although it is similar to that question, that particular solution works for ONE record and if you were to run that through a PHP loop there is a big possibility in your iterations that you may get the same entrant again otherwise you would have to keep a log of each random generated number and check that it isn't already in your pool of 5 random numbers, i feel this can be done with SQL alone and simply trying to negotiate an answer i hope that makes sense. – Bass Jun 05 '14 at 21:53

2 Answers2

1

If you have a few hundred record, I think that order by rand() solution should be fine: subquery will order weighting number of entries, but duplicates remains. Parent SELECT will take the first 5 distinct rows.

SELECT DISTINCT firstName , 
                lastName , 
                compID 
FROM
( SELECT compEntryid ,firstName , lastName , compID, rand()/(select count(*) 
  FROM   t 
  WHERE  firstName=t1.firstName AND
         lastName = t1.lastName) AS rank
  FROM   t t1 
  WHERE  compID = 100 
  ORDER BY rank) t2 
LIMIT 5

Fiddle

kiks73
  • 3,718
  • 3
  • 25
  • 52
  • As you can see in linked question, `ORDER BY rand()` is relatively slow. – Lkopo Jun 05 '14 at 13:04
  • i understand it is slow, but given the table will have no more than a few hundred records it seems most trivial ? Unless i use something like select cols from table where rand() < 5/(select count(*) from table ) where compid = 100 limit 5 but even that i tried and doesn't work – Bass Jun 05 '14 at 13:20
  • @Bass I've edited my answer. I think this could be the right way if you have not performance problems. – kiks73 Jun 05 '14 at 14:04
  • Interesting solution. I quite like that. It doesn't bring back a compEntryId. You might be able to get that using MIN / GROUP BY, but GROUP BY implicitly adds an order which might mess up the randomness for the limit. – Kickstart Jun 05 '14 at 14:25
  • Thanks kiks73 ! your answer and kikstarts answer both work a treat , however which is more optimised ? does your inner query still need to create that temporary table due to use of ORDER BY RAND() ? or because its's a subquery that doesn't apply ? Also i will need a WHERE clause for the COMPID so that i dont include other competition entrants ? thanks ! – Bass Jun 05 '14 at 21:48
  • @Bass, I've edited again my answer and the fiddle too with the where condition on compID. Onestly I didn't pay attention to the execution plan to check how much order by rand() impacts. – kiks73 Jun 06 '14 at 15:41
  • kiks73 thanks so much mate ! I may also have to include a DISTINCT CONCAT on firstname, lastname and email as this is the only way to determine if it is a DIFFERENT entrant ? This is really difficult this task as a group by will almost nullify the advantage of having multiple entries and this is what is really testing me at the moment! – Bass Jun 07 '14 at 00:48
  • @Bass DISTINCT already is in parent query. No need to insert a CONCAT because DISTINCT works on the whole row. – kiks73 Jun 07 '14 at 05:38
0

I think you will need to use a sub query if you want to return a compEntryid.

SELECT t.firstName, t.lastName, t.compID, MIN(compEntryid)
FROM t
INNER JOIN
(
    SELECT DISTINCT firstName, lastName, compID
    FROM t
    ORDER by rand() 
    LIMIT 5
) t2
ON t.firstName = t2.firstName
AND t.lastName = t2.lastName
AND t.compID = t2.compID
GROUP BY t.firstName, t.lastName, t.compID;

This uses a sub query to get 5 random firstName / lastName / compID. Then joins against the table to get the MIN compEntryId.

However not certain about this. Think it will eliminate the duplicates in the sub query before performing the order / limit, which would prevent someone with more entries having more chances.

EDIT

More of a play and I think I have found a solution. Although efficiency is not one of its strong points.

SELECT MIN(compEntryid), firstName, lastName, compID
FROM
(
    SELECT firstName, lastName, compID, compEntryid, @seq:=@seq+1 AS seq
    FROM
    (
        SELECT firstName, lastName, compID, compEntryid
        FROM t
        ORDER by rand()
    ) sub0
    CROSS JOIN (SELECT @seq:=0) sub1
) sub2
GROUP BY sub2.firstName, sub2.lastName, sub2.compID
ORDER BY MIN(seq)
LIMIT 5

This has an inner sub query that gets all the records in a random order. Around that another sub query adds a sequence number to the records. The outer query groups by the name, etc, and orders by the min sequence number for that name. The compEntryId is just grabbed as the MIN for the name / competition (I am assuming you don't care too much about this).

This way if someone had 5 entries the inner sub query would mix them up in the list. the next sub query would add a sequence number. At this stage those 5 entries could be sequence numbers 1 to 5. The outer one would order by the lowest sequence number for the name and ignore the others, so of those 5 only sequence number 1 would be used and 2 to 5 ignored, with the next selected person being the one with sequence number 6.

This way the more entries they have the more likely they are to be a winner, but can't be 2 of the 5 winners.

With thanks to kiks73 for setting up some sqlfiddle data:-

http://sqlfiddle.com/#!2/cd777/1

EDIT

A solution based on that above by @kiks73. Tweaked to use a non correlated sub query for the counts, and eliminates a few uncertainties. For example with his solution I am not quite sure whether MySQL will chose to do the DISTINCT by implicitly doing a GROUP BY, which would also implicitly do an orderering of the results prior to doing the limit (it doesn't seem to, but I am not sure this behaviour is defined).

SELECT t.firstName , 
        t.lastName , 
        t.compID,
        MIN(rand() / t1.entry_count) AS rank
FROM
(
    SELECT firstName, lastName, compID, COUNT(*) AS entry_count
    FROM   t 
    GROUP BY firstName, lastName, compID
) t1
INNER JOIN t
ON  t.firstName=t1.firstName 
AND t.lastName = t1.lastName
AND t.compID = t1.compID
GROUP BY t.firstName, t.lastName, t.compID
ORDER BY rank
LIMIT 5
Kickstart
  • 21,403
  • 2
  • 21
  • 33
  • Thank you kickstart, a little combersome to understand however i tested it and it works as per above solution too. I also should mention that i have an email field which would be unique to the user so we wouldn't need to check on firstname and lastname is that correct ? And finally i need to to a WHERE on compid ? – Bass Jun 05 '14 at 21:50
  • I have tried writing a stored procedure but im having problems with restricting it to a particular compid ! ? – Bass Jun 06 '14 at 01:00
  • Restricting it to a particular compid would just need a WHERE clause for it within the inner sub query. This query carries through the first name / last name from the inner sub query. You could just get the email address, but depends if you need the other details returned. It is cumbersome and I doubt it is that fast. – Kickstart Jun 06 '14 at 08:38
  • none the less your query works perfectly ! seems weighted as well, although a order by rand() limit 5 and a group by yeilds similar results it doesn't seem as weighted is that right ? – Bass Jun 06 '14 at 09:07
  • I honestly do not know. It might be worth doing some tests with a small set of data. My suspicion is that the GROUP BY will eliminate the duplicates before ordering by RAND. Hence those records with multiple entries will not have an increased chance of being within the 5 winners. It is the weighting that makes your requirement so complex. The suggestion by @kiks73 does the weighting differently (and quite cleverly). Not sure on the accuracy of it, but suspect at worst it will be fine for most requirements and it should be more efficient. – Kickstart Jun 06 '14 at 09:50
  • 1
    Kickstart you're an absolute genious mate ! I noticed in your profile you have zero artistic abilities ! im an Art director for profession and programmer enthusiast when im not designing, if you ever need any help in that department im more than happy to assist my friend :) – Bass Jun 08 '14 at 13:02