I have a query in Oracle that I've been banging my head against for a while now. It is composed of separate subqueries joined to each other, and the GROUP BY is messing things up.
SELECT TA.PROJECTID,
TA.TASKID,
TA.DUEDATE,
MH.DATE_EXECUTION,
MAX (TA.LAST_UPDATED_DATE) AS LAST_UPDATED_DATE
FROM (TASKS TA
INNER JOIN
(SELECT PA.PROJECTID, MIN(PA.DATE_EXECUTION)
FROM APPROVALS pa
GROUP BY pa.PROJECTID) MH
ON TA.PROJECTID = MH.PROJECTID)
WHERE
TA.LAST_UPDATED_DATE <= MH.DATE_EXECUTION
GROUP BY TA.PROJECTID,
TA.TASKID,
MH.DATE_EXECUTION,
TA.DUEDATE
The first query block on the right side of the first inner join (select pa.projectid...) returns a table with project IDs and a DATE_EXECUTION column showing the date that the project was initiated - one line per project.
I am trying to return a single row per project ID with the latest LAST_UPDATED_DATE before the DATE_EXECUTION for that project. The grouping by DUEDATE is messing things up and causing it to return multiple rows. The idea here is that each row in the TASKS table is a different time a user updated the due date for the project, and I'm trying to identify the due date that was current when the project started execution (the DATE_EXECUTION).
I feel like I'm almost there, just not sure what I'm missing in my approach.
Sample data is:
PROJECTID | TASKID | DUEDATE | DATE_EXECUTION | LAST_UPDATED_DATE |
---|---|---|---|---|
abc123 | 1 | 2022-01-07 | 2021-03-31 | 2021-03-03 |
abc123 | 1 | 2021-12-20 | 2021-03-31 | 2021-03-11 |
abc123 | 1 | 2021-12-27 | 2021-03-31 | 2020-11-18 |
abc123 | 1 | 2022-01-01 | 2021-03-31 | 2021-05-01 |
So in the above data for the combination of project abc123 and task ID 1, I'd want the second row: the row where the LAST_UPDATED_DATE is the latest prior to the DATE_EXECUTION.