my problem is , I don't want to give the same question two or more times to a user,
how can i keep a record of retrieved questions? Do I have to create tables for each
and every users? won't that increase the load time?
Yes, but possibly not so much.
You keep a single extra table with userId, questionId
and insert there the questions already asked to the various users.
When you ask question 123 to user 456, you run a single INSERT
INSERT INTO askedQuestions (userId, questionId) VALUES (456, 123);
Then you extract questions from questions
with a LEFT JOIN
SELECT questions.* FROM questions
LEFT JOIN askedQuestions ON (questions.id = askedQuestions.questionId AND askedQuestions.userId = {$_SESSION['userId']} )
WHERE askedQuestions.userId IS NULL
ORDER BY RAND() LIMIT 1;
if you keep askedQuestions
indexed on (userId, questionId)
, joining will be very efficient.
Notes on RAND()
Selecting on a table like this should not done with ORDER BY RAND()
, which will retrieve all the rows in the table before outputting one of them. Normally you would choose a questionId
at random, and select the question with that questionId
, and that would be waaaay faster. But here, you have no guarantee that the question has not been already asked to that user, and the faster query might fail.
When most questions are still free to ask, you can use
WHERE questions.questionId IN ( RAND(N), RAND(N), RAND(N), ... )
AND askedQuestions.userId IS NULL LIMIT 1
where N is the number of questions. Chances are that at least one of the random numbers you extract will still be free. The IN
will decrease performances, and you will have to strike a balance with the number of RAND
s. When questions are almost all asked, chances of a match decrease, and your query might return nothing even with many RAND
s (also because RAND
s will start yielding duplicate IDs, in what is known as the Birthday Paradox).
One way to achieve the best of both worlds could be to fix a maximum number of attempts, say, three (or better still, based on the number of questions left over).
For X times you generate (in PHP) a set of Y random ids betweeen 1 and 1000, and try to retrieve (userId, questionId)
from askedQuestions
. The table is thin and indexed, so this is really fast. If you fail, then the extracted questionId
is random and free, and you can run
SELECT * FROM questions WHERE id = {$tuple['questionId']};
which is also very fast. If you succeed X times, i.e., for X times, all Y random questionIds are registered as being already asked, then you run the full query. Most users will be served almost instantly (two very quick queries), and only a few really dedicated users will require more processing. You might want to set some kind of alerting to warn you of users running out of questions.