SQL query to group fetching other data from that row

Question

I have a schema as follows:

CREATE TABLE `a` (
  `id` char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  `c_id` char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  `d_id` char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

INSERT INTO a VALUES ('1673b492fe9049a5bda9dcea56e9de6a', '1673b49303d04aadba461c27726f8cb8', '16328bc433604fe0af1329a0c2bc0312');

INSERT INTO a VALUES ('163b75aeafe0479aa426e506c687da91', '1673b49303d04aadba461c27726f8cb8', '15fbace119504b1db81bb6409b77fb26');

INSERT INTO a VALUES ('161ad54eb7f042c584881ed6dec31e68', '16328bc433604fe0af1329a0c2bc0312', '160705a0b0304a02b00ec47c2c84f99b');

SQLFiddle

I require a query which basically looks like select c_id, max(id), d_id from a group by c_id, in which the data is coming from the row which matches MAX(id)

Of course, this is not possible in the SQL mode which allows only full group by:

Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'a.d_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Is this possible, and how do I do it? Preferably without a join.

Make use of corelated subquery for matching data coming from the row with max() — Himanshu, Nov 23 '18 at 20:50
@MadhurBhaiya simply because I'm trying to remove all joins from the code in preparation for sharding — bear, Nov 24 '18 at 10:58
@bear ok What is your MySQL server version ? If it is 8.0 and above, we can use Window functions; in older version, we can either use correlated subqueries or user-defined variables. Please confirm the version; I will add more possible approaches. — Madhur Bhaiya, Nov 24 '18 at 11:02
@bear Added another version to the answer utilizing no join. Let know if it works. — Madhur Bhaiya, Nov 26 '18 at 14:43

Madhur Bhaiya · Accepted Answer · 2018-11-26T14:42:09.223

In a Derived Table, you can get the maximum values of id (latest id) for every c_id. You can then join this result-set to the main table, to get only the row corresponding to latest id.

View Query 1 on DB Fiddle

SELECT a.* 
FROM a 
JOIN (SELECT c_id, max(id) AS max_id 
      FROM a 
      GROUP BY c_id) AS dt
  ON dt.c_id = a.c_id AND 
     dt.max_id = a.id

Result

| id                               | c_id                             | d_id                             |
| -------------------------------- | -------------------------------- | -------------------------------- |
| 161ad54eb7f042c584881ed6dec31e68 | 16328bc433604fe0af1329a0c2bc0312 | 160705a0b0304a02b00ec47c2c84f99b |
| 1673b492fe9049a5bda9dcea56e9de6a | 1673b49303d04aadba461c27726f8cb8 | 16328bc433604fe0af1329a0c2bc0312 |

Since you are explicitly looking for a solution not utilizing JOIN, we can use User-defined variables. We will determine Row number within grouping of c_id with highest id value having row number 1. Eventually, we will consider only those rows where the row number value is 1. Check further explanation here: https://stackoverflow.com/a/53465139/2469308

Note that CROSS JOIN does not join to an actual table; instead it is a trick to initialize the user-variables within a single query.

One noteworthy point here is that your c_id column is defined with CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci. So, while initializing the variables we need to explicitly ensure the same character set and collation. Otherwise, we get an error of "Illegal mix of collations". This happens because the string variables are by-default initialized with utf8mb4_general_ci collation. We will use CAST(..) function to ensure the initialization with utf8mb4_unicode_ci instead.

SELECT
dt2.id, dt2.c_id, dt2.d_id 
FROM 
(
  SELECT
    dt1.id, dt1.c_id, dt1.d_id, 
    @rn := CASE WHEN @cid = dt1.c_id THEN @rn + 1
                WHEN @cid := dt1.c_id THEN 1
           END AS row_num 
  FROM   
  (
     SELECT
       id, c_id, d_id 
     FROM a 
     ORDER BY c_id, id DESC
  ) dt1
  CROSS JOIN 
  (
     SELECT 
       @rn := 0, 
       @cid := CAST('' AS CHAR(32) CHARACTER SET utf8mb4) COLLATE utf8mb4_unicode_ci 
  ) AS user_vars 
) dt2 
WHERE dt2.row_num = 1

View Query 2 on DB Fiddle

SQL query to group fetching other data from that row

1 Answers1