4

I have 2 many to many tables, and a table to join them.

Officer

  • id
  • name

Report

  • id
  • performanceDate
  • title

report_officer

  • officer_id
  • report_id

I want to select all officers that haven't ever been associated with a report or who has not been associated with a report within a certain timeframe.

So far I have tried the following (below doesn't work for me!):

SELECT * 
  FROM Officer 
       LEFT JOIN report_officer 
            ON Officer.id = report_officer.officer_id 
       LEFT JOIN Report 
            ON Report.id = report_officer.report_id
 WHERE (performanceDate IS NULL 
        OR performanceDate < "2014-03-23 00:00:00" 
        OR performanceDate > "2014-04-01 00:00:00"
        )

My left join query works only when the officer has ONLY been associated with a report within a certain timeframe, but fails once they have multiple reports.

Result:

+------------+-----------------+
| officer_id | performanceDate |
+------------+-----------------+
|        130 | NULL            | # good

|        134 | 2014-03-02      | # bad - officer_id 134 has a performanceDate  
|        134 | 2014-03-09      | # on 2014-3-30, I do not want this in the results.
|        134 | 2014-03-16      | # 

|        135 | 2014-03-02      | # good
+------------+-----------------+

SQL Fiddle: http://sqlfiddle.com/#!2/1bf72/3 <- in the sql fiddle, please refer to the 'name' field for which columns I am looking to have returned.

Any ideas on how to make this work?

Ideally I would like to make this as simple as possible to work with my ORM. I am using doctrine and would prefer not to start using totally custom code (so if it can be done with only joins, that would be great). I though have a bad feeling I need a sub-query.

PressingOnAlways
  • 11,948
  • 6
  • 32
  • 59

6 Answers6

4
SELECT Officer.*, Report.performanceDate FROM Officer
LEFT JOIN report_officer ON Officer.id = report_officer.officer_id 
LEFT JOIN Report ON Report.id = report_officer.report_id
 AND
   (performanceDate > "2014-03-23 00:00:00" AND
    performanceDate < "2014-04-01 00:00:00")
WHERE Report.id IS NULL

You only want to join rows within a specific date range, so you must move the constraint into the on clause of the join and inverse the constraint.

If you want to remove duplicates, you could try a group by:

SELECT Officer.id, MAX(Report.performanceDate) FROM Officer
LEFT JOIN report_officer ON Officer.id = report_officer.officer_id 
LEFT JOIN Report ON Report.id = report_officer.report_id
 AND
   (performanceDate > "2014-03-23 00:00:00" AND
    performanceDate < "2014-04-01 00:00:00")
WHERE Report.id IS NULL
GROUP BY Officer.id

but you have to decide on which date you want to get, if there are multiple performance dates in your requested date range (or you could use GROUP_CONCAT to gather all dates).

Update

Actually I am relatively sure, that what you want to achieve is not possible with LEFT JOINs at all...

What always works are subquery solutions:

SELECT Officer.id as OfficerID, Officer.name,
Report.id as ReportID,
Report.performanceDate

FROM Officer
LEFT JOIN report_officer
  ON Officer.id = report_officer.officer_id 
LEFT JOIN Report
  ON Report.id = report_officer.report_id

WHERE Report.id IS NULL 
OR NOT EXISTS (
    SELECT * FROM report_officer
    INNER JOIN Report ON report_id = Report.id
    WHERE officer_id = Officer.id AND
      performanceDate > "2014-03-23 00:00:00" 
      AND performanceDate < "2014-04-01 00:00:00"
)

but these are not that performant... This one looks if there are reports which should prohibit outputting the row.

Ulrich Thomas Gabor
  • 6,584
  • 4
  • 27
  • 41
  • This works, but there are 2 problems. #1, I need to get the data in the performanceDate column - now I'm only getting null. #2, for record 134 in my example, I'm getting 3 rows with null performanceDate - it still doesn't solve my problem of duplicate rows. This one is getting close! – PressingOnAlways Feb 27 '14 at 08:52
  • You should be able to get the PerformanceDate just by adding it to the `SELECT` list. I added another query to the bottom, which should eliminate duplicates. – Ulrich Thomas Gabor Feb 27 '14 at 08:55
  • It doesn't work as expected, still only getting NULLs for performanceDate - http://sqlfiddle.com/#!2/6a29c/4 – PressingOnAlways Feb 27 '14 at 09:14
  • Ehm... if you want those Officers, where there exists no report for, where should MySQL take the PerformanceDate from? This exists only if there is a report. – Ulrich Thomas Gabor Feb 27 '14 at 09:18
  • I updated the sqlfiddle to make it more clear what I am looking for. http://sqlfiddle.com/#!2/1bf72/4 Your query is still returning rows I don't want and the ones that are outside the range do not have a date. – PressingOnAlways Feb 27 '14 at 09:21
  • I've come up with a solution, which uses a subquery... see above. – Ulrich Thomas Gabor Feb 27 '14 at 10:19
  • 1
    You [don't want to use BETWEEN with date/time/timestamp types](https://sqlblog.org/2011/10/19/what-do-between-and-the-devil-have-in-common). This _does_ apply to MySQL because you can define different numbers of fractional seconds. Any time you have `WHERE (SELECT COUNT(*) ...) = 0`, you're really looking at a `WHERE NOT EXISTS` clause. – Clockwork-Muse Feb 27 '14 at 11:00
  • @Clockwork-Muse I updated the answer with your feedback, as it seems reasonable. – Ulrich Thomas Gabor Feb 27 '14 at 11:09
  • No, you don't need `COUNT(*)` _at all_; you don't care about how many rows you actually get, just whether or not you have any. And you usually want to make timestamps an inclusive lower bound (`>=`), in case something happens at midnight (or the db normalizes everything to that point in time). – Clockwork-Muse Feb 27 '14 at 11:19
  • @Clockwork-Muse Sure, that should have been replaced. I just forgot. I do not change anything to the dates, since I am not sure that the questioner wants. They has to read this comment itself and insert into his constraints whatever time restrictions they want. – Ulrich Thomas Gabor Feb 27 '14 at 11:22
2

I want to select all officers that haven't ever been associated with a report or who has not been associated with a report within a certain timeframe.

Your two conditions are redundant: if an officer has not been associated, ever, then he also cannot have been associated in any timeframe, and will be selected by the second condition. If he has reports in the timeframe, then he isn't selected because of the second condition, but he also has at least one report and cannot satisfy the first.

So what you want is, "an officer with zero reports in the timeframe".

To do so, just invert the conditions: first get those reports within the selected timeframe (i.e. those officers you don't want); then LEFT JOIN with Officer, requiring that the join yields null. This will get you the other officers, those that have no report in the selected time frame (or maybe no report at all).

In this case, though, you cannot have a report date, since you have no reports (this is more apparent for those officers with no report at all):

SELECT
   Officer.id as OfficerID,
   Officer.name,
   MAX(Report.id) as ReportID,
   MAX(performanceDate) AS performanceDate
FROM Officer
LEFT JOIN report_officer ON (Officer.id = report_officer.officer_id)
LEFT JOIN Report ON (Report.id = report_officer.report_id 
   AND performanceDate BETWEEN 20140323 AND 20140401)
GROUP BY Officer.id, Officer.name
HAVING ReportID IS NULL;

I don't know about Doctrine and HAVING. If you cannot use a HAVING clause, you may try to emulate it by running this, which ought to be quite standard:

SELECT
   Officer.id as OfficerID,
   Officer.name,
   COUNT(Report.id) as reports
FROM Officer
LEFT JOIN report_officer ON (Officer.id = report_officer.officer_id)
LEFT JOIN Report ON (Report.id = report_officer.report_id 
   AND performanceDate BETWEEN 20140323000000 AND 20140401235959)
GROUP BY Officer.id, Officer.name;

and then apply a filter where reports is equal to 0, i.e., no reports in the given timeframe. You can add MAX(performanceDate) AS performanceDate, MAX(Report.id) AS ReportID to get the date of the (e.g. latest) report for those officers who do have at least one outside the timeframe. This might be not the report you'd want.

Care must be taken when specifying date range, since YYYYMMDD equals normally YYYYMMDD000000, which could lead to the equivalent of a semi-inclusive range. Otherwise, replace BETWEEN with performanceDate >= '2014-03-23 00:00:00' AND performanceDate <= '2014-04-01 23:59:59'.

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • I think everyone misunderstood what I am looking for. I am looking for all the officers that do not have ANY reports yet (NULL) or the officers who do not have a report recently. When I tried your SQL, i get null results. – PressingOnAlways Feb 27 '14 at 08:44
  • Yes, this is what I meant. You do this by looking for those officers you **don't** want, because it's easier; then you use a `LEFT JOIN` to get the *other* officers. – LSerni Feb 27 '14 at 08:52
  • 1
    I added the sql fiddle. – PressingOnAlways Feb 27 '14 at 09:08
  • 1
    It's probably better to avoid [using BETWEEN with date/time/timestamp types](https://sqlblog.org/2011/10/19/what-do-between-and-the-devil-have-in-common). This _does_ apply to MySQL, because different fractional seconds can be declared for timestamps. Assuming that `MAX(id)` gets the most-recent report likely returns the correct results, but relies on meta knowledge. – Clockwork-Muse Feb 27 '14 at 11:02
  • @Clockwork-Muse , all true. Rather than removing BETWEEN I added an explanation (maybe I should have added something for reports being filed *exactly* on midnight, or being so rounded). Thanks. – LSerni Feb 27 '14 at 11:15
1

Thanks for everyone's help on the problem. My final solution was to use GROUP BY and HAVING clause.

@Iserni, I did not need to SELECT an officer with 0 reports in the timeframe, I was able to SELECT all the officers with reports outside the timeframe or officers with null reports using HAVING.

Here is my final code:

SELECT Officer.id AS OfficerID, Officer.name, Report.id AS ReportID, max(performanceDate) as maxPerfDate FROM Officer
LEFT JOIN report_officer ON Officer.id = report_officer.officer_id
LEFT JOIN Report ON Report.id = report_officer.report_id
GROUP BY Officer.id HAVING maxPerfDate is null OR 
           maxPerfDate < "2014-03-23 00:00:00" OR
           maxPerfDate > "2014-04-01 00:00:00";

The benefit of this is that I can utilize the performanceDate to report when the last time the officer filed a report or report that he never created a report. All the other solutions that were suggested removed the ability to retrieve the valuable information on when the last time the officer created a report.

PressingOnAlways
  • 11,948
  • 6
  • 32
  • 59
0

Adding another AND condition may solve your issue.

AND performanceDate NOT BETWEEN "2014-03-23 00:00:00" AND "2014-04-01 00:00:00"
Ravinder Reddy
  • 23,692
  • 6
  • 52
  • 82
0

Or you may exclude the records that satisfy your condition...

SELECT *
FROM OFFICER
WHERE ID NOT IN (SELECT OFFICER_ID
                 FROM REPORT_OFFICER)
   OR ID NOT IN (SELECT OFFICER_ID
                 FROM REPORT_OFFICER
                 WHERE performanceDate BETWEEN "2014-03-23 00:00:00" AND "2014-04-01 00:00:00")
Corrado Piola
  • 859
  • 1
  • 14
  • 18
0

Can you just use a WHERE NOT EXISTS statement like the following?

SELECT *
FROM Officer
WHERE NOT EXISTS
  (
    SELECT Report.ID
FROM 
  Report_Officer
   INNER JOIN 
  Report ON 
    Report_Officer.Report_ID = Report.ID
WHERE 
  Report_Officer.Officer_ID = Officer.ID AND
  Report.PerformanceDate BETWEEN "2014-03-23 00:00:00" AND "2014-04-01 00:00:00"
   )
AHiggins
  • 7,029
  • 6
  • 36
  • 54
  • Probably, but 2 things... #1, a subquery generally is not as efficient as just joins and more importantly, #2, I am not sure if I could execute such a subquery cleanly using my ORM. – PressingOnAlways Feb 27 '14 at 18:07