Select rows from tableA based on age calculation from tableB

Question

table1 we have ID, DOB(date of birth, eg. 01/01/1980) Table2 we have id and other columns

How to get all rows from table 2 if id is under the age of 20?

I currently have:

SELECT *
FROM table2
WHERE id IN (
    SELECT id
    FROM table1
    WHERE TIMESTAMPDIFF(Year,DOB,curdate()) <= 20
)

Is my solution efficient?

Never use `*`. Always define the columns you want to select. — Mech, Jun 09 '20 at 05:27

Caius Jard · Answer 1 · 2020-06-09T06:05:34.847

You would be better off calculating a date 20 years ago and asking if the table data is after that date. This means one calculation is needed, not a calculation for every row in the table. Any time that you perform a calculation on row data it means an index cannot be used. This is catastrophe for performance if DOB is indexed

TIMESTAMPDIFF doesn't count the number of years between two dates, it give you the number of times the year rolls over 31 dec for two dates. This means asking for the difference between 31 dec and 1 jan will report as 1 year when in fact it is only one (or upto two) days (depending on the times)

SELECT id  
FROM table1 
where DOB > DATE_SUB(CURDATE(), INTERVAL 20 YEAR)

Personally I use join rather than IN because once you learn the pattern it is easy to extend it using LEFT joins to look for rows that don't exist or match the patterns, but in practical terms the query optimizer rewrites IN and JOIN to execute them the same anyway. Some dB perform poorly for IN, because they execute them differently to joins

SELECT * 
FROM 
  table1 t1
  INNER JOIN table2 t2
  ON t1.id = t2.id
where t1.DOB > DATE_SUB(CURDATE(), INTERVAL 20 YEAR)

Mech is making the point about select * that it should be avoided in production code. That's a relevant point for the most part - always select only the columns you need (sometimes if a dB has indexed a table and you only need columns that are in the index, then using select * will be a performance hit because the dB has to use the index to look up which rows then lookup the rows. If you specify the columns you need it can decide whether it can answer the query purely from the index for a speed boost. The only time I might consider using select * is in a sub query where the optimizer will rewrite it anyway

Always alias your tables and use the aliases. This prevents your query breaking if later you add a column to either table that is the same name as a column in the other table. While adding things isn't usually a problem or cause bugs and crashes, if a query just "select name from a join b.." and only table a has a name column, it will start crashing if a name column is added to b. Specifying a.name would prevent this

Akina · Answer 2 · 2020-06-09T05:35:56.853

0

For MySQL

SELECT table2.*
FROM table1
JOIN table2 ON table1.id = table2.id
WHERE table1.dob >= CURRENT_DATE - INTERVAL 20 YEAR

edited Jun 09 '20 at 05:35

answered Jun 09 '20 at 05:27

Akina

39,301
5
14
25

This encourages the poor practice of using `*` in a `select` statement. For more information, please read https://stackoverflow.com/a/3639964/4101210. – Mech Jun 09 '20 at 05:29
@Mech Some of my programs uses user-configured reporting service which may need in any column which exists in a table (moreover - any column which not exists now but may occur in future, and this may work without code correction), and at the same time the reporting subsystem does not have any feedback... may you offer any solution other than select all existing columns? Declaring any rule to be the absolute truth is what it is bad practice. – Akina Jun 09 '20 at 05:33
1

@Mech Do you really think that all provided solutions are applied by OPs without any edition? Point out a possible problem to OP, and let him take it into account or ignore it. – Akina Jun 09 '20 at 05:40
@Mech I think your downvote purely on the basis of use of `SELECT *` is uncalled for. The OP has not given any indication of what columns are in his table, and what he will ultimately do with this query is up to him (including any consequences). You have already made your point to him directly about it, there is no reason to parade round SO downvoting every sql question/answer that mentions select *. It may well be acceptable to select * in a single table sub query, for example – Caius Jard Jun 09 '20 at 05:49
Casting down vote because of `*` doesn't make sense. It is a Sql standard of providing a solution. Since the used did not provide columns in his query, Akina's solution is perfectly valid – Jim Macaulay Jun 09 '20 at 05:50
Despite `*` is not really good in defined quries, it sometimes not so bad. I'm giving voteup for the answer. It is fine for explanation purpuses. – RusArtM Jun 09 '20 at 06:33

score 0 · Answer 3 · answered Jun 09 '20 at 12:10

Historically, MySQL has implemented EXISTS more efficiently than IN. So, I would recommend:

SELECT t2.*
FROM table2 t2
WHERE EXISTS (SELECT 1
              FROM table1 t1
              WHERE t1.id = t2.id AND
                    TIMESTAMPDIFF(Year, t1.DOB, curdate()) <= 20
             );

For performance, you want an index on table1(id, DOB).

You can also change the year comparison to:

t1.DOB <= curdate() - interval 20 year

That is presumably the logic you want and the index could take advantage of it.

I recommend this over an join because there is no risk of having duplicate rows in the result set. Your question does not specify that id is unique in table1, so duplicates are a risk. Even if there are no duplicates, this would also have the best performance under many circumstances.

Select rows from tableA based on age calculation from tableB

3 Answers3