I'm currently working on a medium-sized web project, and I've ran into a problem.
What I want to do is display a question, together with an image. I have a (global) list of questions, and a (global) list of images, all questions should be asked for all images.
As far as the user can see the question and image should be chosen at random. However the statistics from the answers (question/image-pair) will be used for research purposes. This means that all the question/image-pair must be chosen such that the answers will be distributed evenly across all question, and across all images.
A user should only be able to answer a specific question/image-pair one time.
I am using a mysql database and php. Currently, i have three database tables:
tbl_images (image_id)
tbl_questions (question_id)
tbl_answers (answer_id, image_id, question_id, user_id)
The other columns are not related to this specific problem.
Solution 1:
Track how many times each image/question has been used (add a column in each table). Always choose the image and question that has been asked the least.
Problem:
What I'm actually interested in is distribution among questions for an image and vice versa, not that each question is even globally.
Solution 2:
Add another table, containing all question/image-pairs along with how many times it has been asked. Choose the lowest combination (first row if count column is sorted by ascending order).
Problem:
Does not enforce that the user can only answer a question once. Also does not give the appearance that the choice is random to the user.
Solution 3:
Same as #2, but store question/image/user_id in table.
Problem:
Performance issues (?), a lot of space wasted for each user. There will probably be semi-large amounts of data (thousands of questions/images and atleast hundreds of users).
Solution 4:
Choose a question and image at true random from all available. With a large enough amount of answers they will be distributed evenly.
Problem:
If i add a new question or image they will not get more answers than the others and therefore never catch up. I want an even amount of statistics for all question/image-pairs.
Solution 5:
Weighted random. Choose a number of question/image pairs (say about 10-100) at true random and pick the best (as in, lowest global count) of these that the user has not answered.
Problem:
Does not guarantee that a recently added question or image gets a lot of answers quickly.
Solution #5 is probably the best once I've come up with so far.
Your input is very much appreciated, thank you for your time.