0

I have many tables that log the users action on some forum, each log event has it's date. I need a query that gives me all the users that wasn't active in during the last year. I have the following query (working query):

SELECT *
FROM (questions AS q
    INNER JOIN Answers AS a
    INNER JOIN bestAnswerByPoll AS p
    INNER JOIN answerThumbRank AS t
    INNER JOIN notes AS n
    INNER JOIN interestingQuestion AS i ON q.user_id = a.user_id
    AND a.user_id = p.user_id
    AND p.user_id = t.user_id
    AND t.user_id = n.user_id
    AND n.user_id = i.user_id)
WHERE DATEDIFF(CURDATE(),q.date)>365
    AND DATEDIFF(CURDATE(),a.date)>365
    AND DATEDIFF(CURDATE(),p.date)>365
    AND DATEDIFF(CURDATE(),t.date)>365
    AND DATEDIFF(CURDATE(),n.date)>365
    AND DATEDIFF(CURDATE(),i.date)>365

what i'm doing in that query - joining all the tables according to the userId, and then checking each date column individually to see if it's been more then a year

I was wondering if there is a way to make it simpler, something like finding the max between all dates (the latest date) and compering just this one to the current date

Darko Kenda
  • 4,781
  • 1
  • 28
  • 31
Shai Zarzewski
  • 1,638
  • 3
  • 19
  • 33
  • 1
    read about `UNION` instead of `JOIN`, also learn to format your code so it is readable with at little effort on the readers part, as it stands that lump of mess isn't worth parsing through to give a complete real answer! Nobody is going to put more effort into an answer than you put into your question, at this point, that is very little. –  Apr 27 '13 at 19:19

2 Answers2

2

If you want to get best performance, you cannot use greatest(). Instead do something like this:

SELECT *
FROM questions q
JOIN Answers             a  ON q.user_id = a.user_id
JOIN bestAnswerByPoll    p  ON a.user_id = p.user_id
JOIN answerThumbRank     t  ON p.user_id = t.user_id
JOIN notes               n  ON t.user_id = n.user_id
JOIN interestingQuestion i  ON n.user_id = i.user_id
WHERE q.date > curdate() - interval 1 year
  AND a.date > curdate() - interval 1 year
  AND p.date > curdate() - interval 1 year
  AND t.date > curdate() - interval 1 year
  AND n.date > curdate() - interval 1 year
  AND i.date > curdate() - interval 1 year

You want to avoid datediff() such that MySQL can do index lookup on date column comparisons. Now, to make sure that index lookup works, you should create compound (multi-column) index on (user_id, date) for each one of your tables.

In this compound index, first part (user_id) will be user for faster joins, and second part (date) will be used for faster date comparisons. If you replace * in your SELECT * with only columns mentioned above (like user_id only), you might be able to get index-only scans, which will be super-fast.

UPDATE Unfortunately, MySQL does not support WITH clause for common table expressions like PostgreSQL and some other databases. But, you can still factor out common expression as follows:

SELECT *
FROM questions q
JOIN Answers             a  ON q.user_id = a.user_id
JOIN bestAnswerByPoll    p  ON a.user_id = p.user_id
JOIN answerThumbRank     t  ON p.user_id = t.user_id
JOIN notes               n  ON t.user_id = n.user_id
JOIN interestingQuestion i  ON n.user_id = i.user_id,
(SELECT curdate() - interval 1 year AS year_ago) x
WHERE q.date > x.year_ago
  AND a.date > x.year_ago
  AND p.date > x.year_ago
  AND t.date > x.year_ago
  AND n.date > x.year_ago
  AND i.date > x.year_ago
mvp
  • 111,019
  • 13
  • 122
  • 148
1

In MySQL, you can use the greatest() function:

WHERE DATEDIFF(CURDATE(), greatest(q.date, a.date, p.date, t.date, n.date, i.date)) > 365

This will help with readability. It would not affect performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786