0

I have a table like this (simplified version):

+------+-------+-----+--------------+-----+
| id   | name  | age | company.name | ...   
+------+-------+-----+--------------------+
| 1    | Adam  | 21  |  Google      | ...                  
| 3    | Peter | 20  |  Apple       | ...                 
| 2    | Bob   | 20  |  Microsoft   | ...    
| 9    | Alice | 18  |  Google      | ... 
+------+-------+-----+--------------------+

I need groups data with counting rows by any one column. And I need to get first row in each group. User select which column will be used to group.

If user select column age to group then results:

+------+------------+-------+
| id   | group_name | count | 
+------+------------+-------+
| 9    | 18         |  1    |
+------+------------+-------+
| 2    | 20         |  2    |
+------+------------+-------+
| 1    | 21         |  1    |
+------+------------+-------+

Column to group may be numeric or string.

Currently I does it by this query:

SELECT id, group_name, users_name, count(id) as count FROM (
 SELECT persons.id as id, company.type as group_name, users.name as users_name 
 FROM persons  
 LEFT JOIN company on company.id = persons.company_id 
 LEFT JOIN position on position.id=persons.position_id 
 ...
 LEFT JOIN source on source.id=persons.source_id 
 WHERE ...  
 ORDER BY if(company.type = '' or company.type is null,1,0) ASC,
 company.type ASC, IF(persons.status = '' or persons.status is null,1,0) ASC, 
 persons.status ASC, persons.id
) t1 GROUP BY group_name

but with new version mysql this SQL stoped works I think that order is ignored in sub-select.

I know that similar topics was wroted, but proposed solutions not working with my query. I have to join many tables, add multiple conditions and use cascade order and then select first row from each group. I will be very happy if solution will be optimised for performace.

---- EDIT ----

Proposed solution: SQL select only rows with max value on a column

which suggest to use MAX() and GROUP BY not working well. For two reason

  1. If grouped column include string, then query return not first row, but last row in each group.
  2. If my dataset has a cascade order, I can not use MAX in a few columns at the same time.

I created sqlfiddle which include exact example.

http://sqlfiddle.com/#!9/23225d/11/0

-- EXAMPLE 1 - Group by string 
-- base query
SELECT persons.*, company.* FROM persons 
LEFT JOIN company ON persons.company_id = company.id
ORDER BY company.name ASC, company.id ASC;

--   grouping query
SELECT MAX(persons.id) as id, company.name, count(persons.id) as count
FROM persons
LEFT JOIN company ON persons.company_id = company.id
GROUP BY company.name
ORDER BY company.name ASC, persons.id ASC;

-- The results will be: 
-- |ID | NAME     | COUNT|
-- |1  | Google   | 2    |
-- |3  | Microsoft| 3    |

-- EXAMPLE 2 - Cascade order
-- base query
SELECT persons.*, company.* FROM persons 
LEFT JOIN company ON persons.company_id = company.id
ORDER BY company.type ASC, persons.status ASC;

--  grouping query 
SELECT MAX(persons.id) as id, company.type, count(persons.id) as count
FROM persons
LEFT JOIN company ON persons.company_id = company.id
GROUP BY company.type
ORDER BY company.type ASC, persons.status ASC;

-- The results will be: 
-- |ID | NAME| COUNT|
-- |3  |  1  |   2  |
-- |2  |  2  |   3  |
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • `ORDER BY` has never been the correct way to get the first row of a group. If it was working before, it was just by accident. – Barmar Mar 02 '18 at 01:17
  • I added explain, because this is not duplicat with this [link](https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column) – user2971916 Mar 02 '18 at 01:55
  • @Barmar I know that ORDER BY is bad idea for do this, but I couldn't solve this problem another way. – user2971916 Mar 02 '18 at 01:58
  • Why isn't it a duplicate? You want to get the first row of each group, that's what that question shows how to do. If you also want to get the count, just add that to the subquery that gets the minimum value of the column you're ordering by. – Barmar Mar 02 '18 at 05:41
  • Try to figure it out from that question. If you can't get it working, post what you tried in this question and I'll reopen. – Barmar Mar 02 '18 at 05:42
  • @Barmar thanks for your suggest, I created full example and explained why suggest solution not work. – user2971916 Mar 02 '18 at 13:09
  • "then query return not first row, but last row in each group" -- if you can't see that you simply change `MAX()` to `MIN()`, you're not cut out to be a programmer. – Barmar Mar 02 '18 at 16:46
  • Okay, but how solve problem with cascade order? – user2971916 Mar 02 '18 at 18:09

1 Answers1

0

Just change MAX() to MIN() to get the first row instead of the last row in each group.

To get the extreme values of cascading columns, see SQL : Using GROUP BY and MAX on multiple columns. Use that in the subquery part of the query to get the row containing those extremes, as in SQL select only rows with max value on a column.

So the form of the full query is:

SELECT t1.id, t1.grouped_column, t2.count
FROM yourTable AS t
JOIN (SELECT t3.grouped_column, t3.order_column1, MIN(t4.order_column2) AS order_column2, SUM(t3.count) AS count
      FROM (SELECT grouped_column, MIN(order_column1) AS order_column1, COUNT(*) AS count
            FROM yourTable
            GROUP BY grouped_column) AS t3
      JOIN yourTable AS t4 
      ON t3.grouped_column = t4.grouped_column AND t3.order_column1 = t4.order_column1
      GROUP BY t4.grouped_column, t4.order_column1) AS t2
ON t1.grouped_column = t2.grouped_column AND t1.ordered_column1 = t2.order_column1 AND t1.order_column2 = t2.order_column2

Since you want to operate on a join, I suggest you define a view that uses the join. Then you can use that view in place of yourTable in the above query.

Barmar
  • 741,623
  • 53
  • 500
  • 612