I have a website where people can vote on cars. 4 cars are showed to the user and he/she can vote on the car they like most.
The table cars
has the important columns:
car_id int(10) (not auto_increment, so has gaps)
views int(7)
points int(7)
car_type int(1) (value = 1, 2 or 3)
At the moment I use a mapping table for all car_types, which has a PK which has no gaps. I select the max ID of the mapping table and create 4 random numbers (PHP), select those rows from mapping and get the corresponding car_id's. These numbers I use to select the cars from cars
table.
The problem is that cars added later to the database have less chance to get on the same points as cars added earlier.
My question is how to show 4 cars which have the same amount of points(random) ordered by the least number of views (views asc). Also the important notes:
- The select should only query cars with at least 1 point.
- The database will contain more than 30M cars, it's not about cars but for the question I think it's easier :).
- When 70% of the cars have 1 points, 20% have 2 points and 10% have 3 points, than the random points should select cars 70% of the time with 1 point 20% with 2 points and 10% with 3 points.
- The query will be used to show 4 cars to the visitor, we all know users are impatient so the faster the query the better :)
- I could (if needed) use a mapping table, which will have no gaps in the PK (as I have now).
- Cars only within a certain car_type will be showed. Example, 4 randoms of car type 2 (which is family cars), since I don't want to show sport cars and family cars at same time.
If you know another solution to solve the problem above, I'm willing to accept all kind of solutions (PHP/SQL).
Bounty because it's a bigger question (/answer) than the average Stackoverflow question. Bounty will be rewarded to person which describes the solution or (preferred) the code of the solution. Anyways it's my way to thank the persons helping me and to ensure I appreciate your help greatly.
UPDATE:
Thanks for all the answers so far! You're all right in your answers. I did think a lot about it last few hours and I started to realize that databases were actually never build for things like this (showing random data), it was created to show precise and accurate data with fast access. That's why selects on PK's with 30M rows or more are still very fast. Thats why I'm thinking about doing all the random stuff in PHP. So I generate in PHP like 40 random numbers and select those 40 rows from the mapping table of the right car type. This select with IN
is really fast (like 0.0006 seconds). After this select I got 40 car_ids which I also select with IN
from the cars table. I loop the cars and put them in an array and do some custom sorts (based on points and views). After this I select a random number from all the points within the 40 cars and grab the cars from the array closest to this number of points and with the least views. This way PHP takes care of the randomness and the views part and the queries because you ask for precise data are very fast (0.0006 seconds each).