0

what I am trying to do is prevent MySQL to select duplicated rows by the table primary key.

Please note, data is not real but just for the sake of the question.

prospect_evaluations
id | category   | archived_at | discarded_at | hidden_at
1  | to_contact |      null   |         null |       null
2  | to_contact |      null   |         null |       null
3  | to_convert |      null   |         null |       null
4  | to_convert |      null   |         null |       null

prospect_evaluations_actions
id | evaluation_id | created_at
1  |             1 | 01-02-2017 01:00:00
2  |             1 | 01-02-2017 02:00:00
3  |             2 | 01-02-2017 03:00:00
SELECT prospect_evaluations.*,
       prospect_evaluations_actions.evaluation_id,
       prospect_evaluations_actions.created_at AS lastaction_created_at
FROM `prospect_evaluations`
LEFT JOIN `prospect_evaluations_actions` ON `prospect_evaluations_actions`.`evaluation_id` = `prospect_evaluations`.`id`
WHERE `prospect_evaluations`.`category` = 'to_contact'
  AND `archived_at` IS NULL
  AND `discarded_at` IS NULL
  AND `prospect_evaluations`.`hidden_at` IS NULL
ORDER BY IFNULL( CAST(prospect_evaluations_actions.created_at AS date), CAST(prospect_evaluations.created_at AS date) ) DESC, `prospect_evaluations`.`priority` DESC

What's the query result?

id | category   | archived_at | discarded_at | hidden_at | lastaction_created_at
1  | to_contact |      null   |         null |   null    | 01-02-2017 01:00:00
1  | to_contact |      null   |         null |   null    | 01-02-2017 02:00:00
2  | to_contact |      null   |         null |   null    | 01-02-2017 03:00:00

What do I want instead?

id | category   | archived_at | discarded_at | hidden_at | lastaction_created_at
1  | to_contact |      null   |         null |   null    | 01-02-2017 02:00:00
2  | to_contact |      null   |         null |   null    | 01-02-2017 03:00:00

EDIT: I've just noticed that results are duplicated depending on how many actions they have linked to the evaluation, like if the evaluation has had four actions in the history, it's gonna be duplicated four times. I only need the LAST action so that I can select the creation date and use it to order my entries!

EDIT 2: This situation, in comparison to what has been flagged as a possible duplicate, is different because this question consists of:

  • SELECT of table1
  • LEFT JOIN of table2.

The duplicated question instead consists of:

  • SELECT of table1
  • LEFT JOIN of table1
GiamPy
  • 3,543
  • 3
  • 30
  • 51
  • 1
    The column count in the query and in the result set are not the same. You are missing columns. FYI DISTINCT is working on ROW level, not on individual column or primary key. – Pred Feb 01 '17 at 15:08
  • @Pred I've fixed the result set so that it makes more sense. How do I make it work on COLUMN level instead? Or, how do I limit the results of the LEFT JOIN to 1 only, since that's what causes the problem? – GiamPy Feb 01 '17 at 15:10
  • Are they duplications now? – Pred Feb 01 '17 at 15:11
  • Possible duplicate of [Retrieving the last record in each group](http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group) – Pred Feb 01 '17 at 15:12
  • It's not a duplicate. I've explained the differences between the two. – GiamPy Feb 01 '17 at 15:18
  • Nope, you want the last row from all groups based on another column. It is the same issue, even if you have a left join or whatever involved. – Pred Feb 01 '17 at 15:19
  • I do not seem to understand how to apply that solution then because of the two "differences". It might be obvious to you, not to me though. – GiamPy Feb 01 '17 at 15:21
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/134598/discussion-between-giampy-and-pred). – GiamPy Feb 01 '17 at 15:36

3 Answers3

1

After the chat and the updates in the question, here is the essence of the solution:

For only one column, you can use a correlated subquery:

SELECT
  prospect_evaluations.* 
  , (SELECT MAX(created_at) FROM prospect_evaluations_actions pea WHERE pea.evaluation_id = pe.id) AS last_action_at 
FROM
  `prospect_evaluations` pe

Or you can left join a subquery which calculates the result for each evavluation_id:

SELECT prospect_evaluations.*,
       prospect_evaluations_actions.evaluation_id,
       prospect_evaluations_actions.created_at AS last_action_at 
FROM `prospect_evaluations`
LEFT JOIN (select evaluation_id, max(created_at) as last_action_at from prospect_evaluations_actions group by evaluation_id) pea
  ON `pea`.`evaluation_id` = `prospect_evaluations`.`id`

In order to retrieve the whole record, it is a bit more tricky:

You have to do a 'self join' and use the result as a subquery:

SELECT 
 ...
FROM
  `prospect_evaluations` pe
  LEFT JOIN (
    SELECT pea.*
    (select evaluation_id, max(created_at) as last_action_at from prospect_evaluations_actions group by evaluation_id) pea_max
    INNER JOIN prospect_evaluations_actions pea
      on pea_max.evaluation_id = pea.evaluation_id and pea_max.last_action_at = pea.created_at
  ) pea_record
    ON pe.id = pea_record.evaluation_id

Keep in mind, that this will only work when created_at is unique per evaluation_id!

None of these queries are tested, I may have typos.

Pred
  • 8,789
  • 3
  • 26
  • 46
0

If you only want the last action, you need to specify so - MAX is your friend.

DISTINCT filters out duplicate rows (where there are two rows with the same data in every column).

Aggregate functions (min, max, avg, sum) allow you to take a column and perform arithmetic on it. If you want the last one, your looking for the row with the latest created_at.

SELECT prospect_evaluations.*,
       prospect_evaluations_actions.evaluation_id,
       max(prospect_evaluations_actions.created_at) AS lastaction_created_at
FROM `prospect_evaluations`
LEFT JOIN `prospect_evaluations_actions` ON `prospect_evaluations_actions`.`evaluation_id` = `prospect_evaluations`.`id`
WHERE `prospect_evaluations`.`category` = 'to_contact'
  AND `archived_at` IS NULL
  AND `discarded_at` IS NULL
  AND `prospect_evaluations`.`hidden_at` IS NULL
ORDER BY IFNULL( CAST(prospect_evaluations_actions.created_at AS date), CAST(prospect_evaluations.created_at AS date) ) DESC, `prospect_evaluations`.`priority` DESC
group by prospect_evaluations.*,
       prospect_evaluations_actions.evaluation_id
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • I tried it multiple times, and it seems like it has got syntax errors. Oh, and GROUP BY goes before ORDER BY! – GiamPy Feb 01 '17 at 15:31
0

If you want to get the lastaction_created date, you need to do a MAX() over the date. You then group by prospect_evaluations.id.

SELECT prospect_evaluations.*,
       prospect_evaluations_actions.evaluation_id,
       MAX(prospect_evaluations_actions.created_at) AS lastaction_created_at
FROM `prospect_evaluations`
LEFT JOIN `prospect_evaluations_actions` ON `prospect_evaluations_actions`.`evaluation_id` = `prospect_evaluations`.`id`
WHERE `prospect_evaluations`.`category` = 'to_contact'
  AND `archived_at` IS NULL
  AND `discarded_at` IS NULL
  AND `prospect_evaluations`.`hidden_at` IS NULL
GROUP BY prospect_evaluations.id
    ORDER BY IFNULL( CAST(prospect_evaluations_actions.created_at AS date), CAST(prospect_evaluations.created_at AS date) ) DESC, `prospect_evaluations`.`priority` DESC
ESP32
  • 8,089
  • 2
  • 40
  • 61
  • "Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'devdb_core.prospect_evaluations_actions.created_at' which is not functionally dependent on columns in GROUP BY clause." – GiamPy Feb 01 '17 at 15:32