I am building a dynamic MySQL user search query that is to be able to handle many different search criteria. I considered writing a stored procedure, but ended up building the query in the client (prepared statement in PHP). Among the criteria is to be able to search for users' ages, i.e. between X and Y years old. I am wondering how to do this as efficiently as possible. The end query will be fairly complex and have several joins and may be run on a few million rows in the future, so I need to optimize it where I can. I am storing a user's date of birth in an indexed DATE
column that has the YYYY-MM-DD
format. I have the following User Defined Function (UDF) for calculating a user's age:
RETURN (DATE_FORMAT(current_time, '%Y') - DATE_FORMAT(date_of_birth, '%Y') - (DATE_FORMAT(current_time, '00-%m-%d') < DATE_FORMAT(date_of_birth, '00-%m-%d')));
The details of the calculation are not important; I am more concerned with how it is used. One of my worries is that using this UDF in my WHERE clause would slow the query down significantly, because it needs to be run on every single row, even though I made the UDF deterministic. I cannot guarantee that there will be other criteria to narrow the matching rows down before checking ages. I can't just check the date of birth against a date, because that would not be accurate. I am thinking whether pulling the above calculation out of the UDF and embedding it directly within the query's WHERE clause would be make a noticeable difference (I think yes). The downside would then be that the WHERE clause is further complicated with such a calculation (or actually two, unless there is a way to reuse the result). But I guess there is no way to avoid those calculations. Is performing this calculation in the WHERE clause the way to go in regards to performance, or is there a better way?
Theoretically, I guess I could even add an age
column in the user
table and calculate the age when a user registers and run a scheduled job/cronjob every night to update ages for users that have a birthday of today (if I can select that efficiently). This would surely speed up my search query, but would introduce redundant data. Therefore I would really only want to do this if the calculation cannot be done efficiently within the search query itself.
So, to sum up: I need to search for users within a range of ages (e.g. 25 to 30). Should I calculate the age in the WHERE clause, or would that be very slow because it has to be done on every row? Is it a sacrifice I have to make, or do I have better alternatives?
Any help is much appreciated.