Group exactly four rows sql

Question

I have a table with Transactions, amongst whose columns are id, created_at, and company_id. I'd like to group the four first transactions of every company and return the created_at values of each transaction on each row.

In other words, I want each row of my output to correspond to the four first transactions of each company (so grouping by company_id) with columns showing me the company_id and the created_at of each of those four transactions.

How do I do that?

Sample data:

  id  | company_id | created_at
---------------------------------
 1123 |    abcd    | 10/12/2015
 8291 |    abcd    | 10/14/2015
 9012 |    abcd    | 10/15/2015
 9540 |    abcd    | 10/16/2015
10342 |    abcd    | 10/21/2015
10456 |    abcd    | 10/22/2015
 2301 |    efgh    | 10/13/2015
 4000 |    efgh    | 11/01/2015
 4023 |    efgh    | 11/03/2015
 6239 |    efgh    | 11/08/2015
 7500 |    efgh    | 11/14/2015

Sample output:

  company_id | created_at_1 | created_at_2 | created_at_3 | created_at_4
--------------------------------------------------------------------------
     abcd    |  10/12/2015  |  10/14/2015  |  10/15/2015  |  10/16/2015
     efgh    |  10/13/2015  |  11/01/2015  |  11/03/2015  |  11/08/2015

Add sample table data and expected result. Also tag dbms used, and show us your query attempt. — jarlh, Jan 22 '16 at 13:28
Added data and result, I'm not sure what you mean with "tag dbms used", and I haven't attempted a query yet, I'm not sure how I'd start. — Pedro Carvalho, Jan 22 '16 at 13:40
see this sql http://stackoverflow.com/questions/7074750/sql-query-help-10-records-for-each-distinct-column-value — developerCK, Jan 22 '16 at 13:41
While this can certainly be done with a stored procedure, there seems to be no benefit of doing it this way. I can think of a way to do it with just one subquery in PostreSQL, but only because it supports arrays and has the `array_agg` function. — coladict, Jan 22 '16 at 13:47
Nothing similar on MySQL? @developerCK: I'm not entirely sure how to adapt that question's query to my case. — Pedro Carvalho, Jan 22 '16 at 15:06
Luckily, each company has only 4 transactions, so SELECT * FROM my_table will suffice — Strawberry, Jan 22 '16 at 15:25

Strawberry · Accepted Answer · 2016-01-22T16:54:53.963

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,company_id VARCHAR(12) NOT NULL
,created_at DATE NOT NULL
);

INSERT INTO my_table VALUES
( 1123,'abcd','2015/10/12'),
( 8291,'abcd','2015/10/14'),
( 9012,'abcd','2015/10/15'),
( 9540,'abcd','2015/10/16'),
(10342,'abcd','2015/10/21'),
(10456,'abcd','2015/10/22'),
( 2301,'efgh','2015/10/13'),
( 4000,'efgh','2015/11/01'),
( 4023,'efgh','2015/11/03'),
( 6239,'efgh','2015/11/08'),
( 7500,'efgh','2015/11/14');

SELECT x.* 
  FROM my_table x 
  JOIN my_table y 
    ON y.company_id = x.company_id 
   AND y.created_at <= x.created_at 
 GROUP 
    BY x.id 
HAVING COUNT(*) <= 4 
 ORDER 
    BY company_id
     , created_at;
+------+------------+------------+
| id   | company_id | created_at |
+------+------------+------------+
| 1123 | abcd       | 2015-10-12 |
| 8291 | abcd       | 2015-10-14 |
| 9012 | abcd       | 2015-10-15 |
| 9540 | abcd       | 2015-10-16 |
| 2301 | efgh       | 2015-10-13 |
| 4000 | efgh       | 2015-11-01 |
| 4023 | efgh       | 2015-11-03 |
| 6239 | efgh       | 2015-11-08 |
+------+------------+------------+

A solution with variables will be orders of magnitude faster, e.g...

SELECT a.id
     , a.company_id
     , a.created_at
  FROM 
     ( SELECT x.*
            , CASE WHEN @prev = x.company_id THEN @i:=@i+1 ELSE @i:=1 END i, @prev:=x.company_id prev 
         FROM my_table x
            , (SELECT @i:=1,@prev:=null) vars 
        ORDER 
           BY x.company_id
            , x.created_at
     ) a
 WHERE i <= 4;

That query has not stopped running yet. It may be relevant that my table has 2 million rows. — Pedro Carvalho, Jan 22 '16 at 16:27
A composite index on (company_id,created_at) would probably help. Also, rewriting it with variables would speed things up. — Strawberry, Jan 22 '16 at 16:43

score 0 · Answer 2 · edited May 23 '17 at 12:23

One possible way is the following:

select company_id, 
    min(created_at) as created_at_1,
    (select created_at from t where company_id=t1.company_id order by created_at limit 1 offset 1) as created_at_2,
    (select created_at from t where company_id=t1.company_id order by created_at limit 1 offset 2) as created_at_3,
    (select created_at from t where company_id=t1.company_id order by created_at limit 1 offset 3) as created_at_4
from t as t1
group by company_id

EDIT:

Another possibility (inspired by this answer) is:

select company_id, 
    min(created_at) as created_at_1,
    min(case r when 2 then created_at else null end) as created_at_2,
    min(case r when 3 then created_at else null end) as created_at_3,
    min(case r when 4 then created_at else null end) as created_at_4
from (
    select company_id, created_at,
        (case company_id when @curType 
        then @curRank := @curRank + 1
        else @curRank := 1 and @curType := company_id end)+1 as r
    from t, (select @curRank := 0, @curType := '') f
    order by company_id, created_at
) as o
where r <= 4
group by company_id

This query has been running for 10 minutes and still hasn't returned anything. — Pedro Carvalho, Jan 22 '16 at 15:33
@PedroCarvalho: It worked on the sample data, but may be slow on larger data sets. I have updated my answer with another alternative, but it may also take some time. — Richard Tingstad, Jan 22 '16 at 16:51

score -1 · Answer 3 · answered Jan 22 '16 at 17:09

Could by like this maybe?

   SELECT S.company_id,
          A.created_at created_at_1,
          B.created_at created_at_2,
          C.created_at created_at_3,
          D.created_at created_at_4
     FROM sample S
LEFT JOIN sample A on S.company_id = A.company_id AND A.id NOT IN(S.id)
LEFT JOIN sample B on S.company_id = B.company_id AND B.id NOT IN(S.id, A.id)
LEFT JOIN sample C on S.company_id = C.company_id AND C.id NOT IN(S.id, A.id, B.id)
LEFT JOIN sample D on S.company_id = D.company_id AND D.id NOT IN(S.id, A.id, B.id, C.id)
 GROUP BY S.company_id

http://sqlfiddle.com/#!9/c577e/3

It might not be very efficient, though.

And they are not in order, because your American date format is not good to be sorted. Better switch to TIMESTAMP format.

Group exactly four rows sql

3 Answers3