0

I have 2 tables which have to be joined to get needed data.

Table1 is articles, Table2 is for channels/categories.

For the sake of simplicity, both tables only have 2 fields.

Basically, I need to select N unique IDs for each group (term_id). Values should't repeat between groups.

I can achieve expected results with RANK and PARTITION but it only works with MySQL 8.x, while I need it to work in 5.7.

Fiddle with the table structure, some sample data and current query I have is here

How can this approach be adapted to 5.7 MySQL?

SET GLOBAL sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));
CREATE TABLE `Table1` (
    `ID` BIGINT(20) UNSIGNED NOT NULL,
    `date` DATETIME NOT NULL);
CREATE TABLE `Table2` (
    `object_id` BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
    `term_id` BIGINT(20) UNSIGNED NOT NULL DEFAULT '0'
);

INSERT INTO `Table1` (`ID`, `date`) VALUES 
(195512, '2019-02-27 12:28:18'),
(195260, '2019-01-02 15:26:45'),
(195245, '2019-01-02 14:22:57'),
(192628, '2018-12-20 13:44:39'),
(192563, '2018-12-20 10:53:32'),
(191224, '2018-12-14 15:32:22'),
(189468, '2018-12-07 10:25:31'),
(188520, '2018-12-03 14:59:59'),
(187808, '2018-11-30 14:03:27'),
(185682, '2018-11-20 21:20:18'),
(195473, '2019-01-03 10:52:01'),
(192982, '2018-12-21 19:55:42'),
(192695, '2018-12-20 15:22:41'),
(192611, '2018-12-20 13:19:29'),
(192320, '2018-12-19 14:06:06'),
(192026, '2018-12-18 12:16:22'),
(191218, '2018-12-14 14:37:57'),
(190677, '2018-12-12 13:22:07'),
(190661, '2018-12-12 13:08:27'),
(190147, '2018-12-10 12:43:42'),
(195260, '2019-01-02 15:26:45'),
(192676, '2018-12-20 15:30:08'),
(192409, '2018-12-19 15:44:59'),
(192057, '2018-12-18 14:46:46'),
(191825, '2018-12-17 15:21:13'),
(190968, '2018-12-13 15:59:28'),
(190701, '2018-12-12 15:21:18'),
(190450, '2018-12-11 16:03:07'),
(190188, '2018-12-10 15:08:08'),
(189484, '2018-12-07 15:43:13'),
(195295, '2019-01-02 15:23:54'),
(192876, '2018-12-21 13:25:22'),
(192537, '2018-12-20 11:59:50'),
(192359, '2018-12-19 13:48:19'),
(192050, '2018-12-18 13:18:03'),
(192059, '2018-12-18 13:03:10'),
(192051, '2018-12-18 13:00:17'),
(191581, '2018-12-17 12:00:12'),
(191260, '2018-12-14 15:55:44'),
(190729, '2018-12-12 15:12:36');

INSERT INTO `Table2` (`object_id`, `term_id`) VALUES 
(195295, 568),
(192876, 568),
(192537, 568),
(192359, 568),
(192050, 568),
(192059, 568),
(192051, 568),
(191581, 568),
(191260, 568),
(190729, 568),
(190387, 568),
(190349, 568),
(189122, 568),
(188751, 568),
(187831, 568),
(187273, 568),
(187020, 568),
(187025, 568),
(185320, 568),
(185283, 568),
(195512, 1)  ,
(195260, 1)  ,
(195245, 1)  ,
(192628, 1)  ,
(192563, 1)  ,
(191224, 1)  ,
(189468, 1)  ,
(188520, 1)  ,
(187808, 1)  ,
(185682, 1)  ,
(183886, 1)  ,
(182668, 1)  ,
(182566, 1)  ,
(182194, 1)  ,
(180177, 1)  ,
(179738, 1)  ,
(179181, 1)  ,
(176889, 1)  ,
(176862, 1)  ,
(175258, 1)  ,
(195473, 564),
(192982, 564),
(192695, 564),
(192611, 564),
(192320, 564),
(192026, 564),
(191218, 564),
(190677, 564),
(190661, 564),
(190147, 564),
(189468, 564),
(189190, 564),
(189159, 564),
(189062, 564),
(188732, 564),
(188688, 564),
(188666, 564),
(188609, 564),
(188611, 564),
(188613, 564),
(195260, 91170),
(192676, 91170),
(192409, 91170),
(192057, 91170),
(191825, 91170),
(190968, 91170),
(190701, 91170),
(190450, 91170),
(190188, 91170),
(189484, 91170),
(189224, 91170),
(189011, 91170),
(188716, 91170),
(188522, 91170),
(187585, 91170),
(187297, 91170),
(187094, 91170),
(186788, 91170),
(185769, 91170),
(185577, 91170);

SELECT `ID`, `term_id`, `date`, `rnk`
FROM 
(
 SELECT DISTINCT(`ID`), `term_id`, `date`, RANK() OVER (PARTITION BY `channels`.`term_id` ORDER BY FIELD(`channels`.`term_id`, 1, 564, 91170, 568 ), `main`.`date` DESC) AS `rnk`
 FROM `Table1` AS `main`
 INNER JOIN `Table2` AS `channels` ON(`channels`.`object_id` = `main`.`ID`) AND `channels`.`term_id` IN (1, 564, 91170, 568) 
 GROUP BY (`ID`)) AS x 
WHERE `rnk` <= 3
| ID     | term_id | date                | rnk |
| ------ | ------- | ------------------- | --- |
| 195512 | 1       | 2019-02-27 12:28:18 | 1   |
| 195260 | 1       | 2019-01-02 15:26:45 | 2   |
| 195245 | 1       | 2019-01-02 14:22:57 | 3   |
| 195473 | 564     | 2019-01-03 10:52:01 | 1   |
| 192982 | 564     | 2018-12-21 19:55:42 | 2   |
| 192695 | 564     | 2018-12-20 15:22:41 | 3   |
| 195295 | 568     | 2019-01-02 15:23:54 | 1   |
| 192876 | 568     | 2018-12-21 13:25:22 | 2   |
| 192537 | 568     | 2018-12-20 11:59:50 | 3   |
| 192676 | 91170   | 2018-12-20 15:30:08 | 1   |
| 192409 | 91170   | 2018-12-19 15:44:59 | 2   |
| 192057 | 91170   | 2018-12-18 14:46:46 | 3   |
Strawberry
  • 33,750
  • 13
  • 40
  • 57
Igor Yavych
  • 4,166
  • 3
  • 21
  • 42
  • While it's good that you created fiddle, just pasting that link into your question does not make your question complete. We need to see sample data and the query you have already tried. Note that this problem is fairly ugly in MySQL 5.7. If you have a long term need for this type of query, consider upgrading to MySQL 8+. – Tim Biegeleisen May 14 '19 at 15:06
  • @TimBiegeleisen Unfortunately, upgrading to MySQL is out of question for some time, so I'm looking for a solution in 5.7, ugly or not, as long as it does what's expected and has an okay speed. – Igor Yavych May 14 '19 at 15:11
  • Please search around on this site, as your question has already been answered in at least a few other places. – Tim Biegeleisen May 14 '19 at 15:13
  • @TimBiegeleisen I sure did and failed to adapt any of them to my needs. – Igor Yavych May 14 '19 at 15:13
  • 1
    `SET GLOBAL sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));` No no no disabling is not something you should do.. I hope you know `ONLY_FULL_GROUP_BY` sql_mode is forcing you to write correct ANSI/ISO GROUP BY SQL and when not used you play russian roulette with you resultset?? – Raymond Nijland May 14 '19 at 15:15

1 Answers1

0

Exclusively for versions older than 8.0...

SET @prev = null;
SET @i = 0;

SELECT id
     , term_id
     , date
     , i
  FROM
     (SELECT x.id
           , y.term_id
           , x.date
           , CASE WHEN @prev = term_id THEN @i:=@i+1 ELSE @i:=1 END i
           , @prev:=term_id 
        FROM (SELECT DISTINCT id, date FROM table1) x 
        JOIN table2 y 
          ON y.object_id = x.ID 
       WHERE y.term_id IN (1, 564, 91170, 568) 
       ORDER 
          BY y.term_id
           , x.date DESC
     ) n
 WHERE i<=3
 ORDER 
    BY term_id,i;

    +--------+---------+---------------------+------+
    | id     | term_id | date                | i    |
    +--------+---------+---------------------+------+
    | 195512 |       1 | 2019-02-27 12:28:18 |    1 |
    | 195260 |       1 | 2019-01-02 15:26:45 |    2 |
    | 195245 |       1 | 2019-01-02 14:22:57 |    3 |
    | 195473 |     564 | 2019-01-03 10:52:01 |    1 |
    | 192982 |     564 | 2018-12-21 19:55:42 |    2 |
    | 192695 |     564 | 2018-12-20 15:22:41 |    3 |
    | 195295 |     568 | 2019-01-02 15:23:54 |    1 |
    | 192876 |     568 | 2018-12-21 13:25:22 |    2 |
    | 192537 |     568 | 2018-12-20 11:59:50 |    3 |
    | 195260 |   91170 | 2019-01-02 15:26:45 |    1 |
    | 192676 |   91170 | 2018-12-20 15:30:08 |    2 |
    | 192409 |   91170 | 2018-12-19 15:44:59 |    3 |
    +--------+---------+---------------------+------+

Note that the sample data set has duplicated data. This is oxymoronix in a well--structured RDBMS. So, you may want to fix that.

Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • Thank you, but this doesn't produce expected results, it's not limited to N(3) items per group – Igor Yavych May 14 '19 at 15:22
  • Ah, I guess you have duplicate data somewhere. You need to chuck a DISTINCT in there in that case. – Strawberry May 14 '19 at 15:25
  • Can't quite figure out why, but on bigger dataset, ordering by date is wrong – Igor Yavych May 14 '19 at 15:38
  • There's a technical error in my solution. In practice, it's never been an issue in older versions of MySQL, but just in case, I'll edit... – Strawberry May 14 '19 at 15:41
  • It's not exactly duplicated data, an article can belong to multiple categories. Regardless, it's not possible to change DB structure in this case. Results I'm https://i.imgur.com/j92f9si.jpg – Igor Yavych May 14 '19 at 15:44
  • If the same row appears twice, it's duplicated data. This the danger of over-simplification. – Strawberry May 14 '19 at 15:46
  • Still getting incorrect date (seems to be ASC instead of DESC). Also, ID can be duplicated in table 1, not table2 – Igor Yavych May 14 '19 at 15:56
  • Please take a look at https://www.db-fiddle.com/f/oiTDi8BiDf5GJtUf2gcSDN/0. There is no sorting by date whatsoever – Igor Yavych May 14 '19 at 17:55