0

I have a table with Transactions, amongst whose columns are id, created_at, and company_id. I'd like to group the four first transactions of every company and return the created_at values of each transaction on each row.

In other words, I want each row of my output to correspond to the four first transactions of each company (so grouping by company_id) with columns showing me the company_id and the created_at of each of those four transactions.

How do I do that?

Sample data:

  id  | company_id | created_at
---------------------------------
 1123 |    abcd    | 10/12/2015
 8291 |    abcd    | 10/14/2015
 9012 |    abcd    | 10/15/2015
 9540 |    abcd    | 10/16/2015
10342 |    abcd    | 10/21/2015
10456 |    abcd    | 10/22/2015
 2301 |    efgh    | 10/13/2015
 4000 |    efgh    | 11/01/2015
 4023 |    efgh    | 11/03/2015
 6239 |    efgh    | 11/08/2015
 7500 |    efgh    | 11/14/2015

Sample output:

  company_id | created_at_1 | created_at_2 | created_at_3 | created_at_4
--------------------------------------------------------------------------
     abcd    |  10/12/2015  |  10/14/2015  |  10/15/2015  |  10/16/2015
     efgh    |  10/13/2015  |  11/01/2015  |  11/03/2015  |  11/08/2015
Skatox
  • 4,237
  • 12
  • 42
  • 47
Pedro Carvalho
  • 565
  • 1
  • 6
  • 26
  • Add sample table data and expected result. Also tag dbms used, and show us your query attempt. – jarlh Jan 22 '16 at 13:28
  • Added data and result, I'm not sure what you mean with "tag dbms used", and I haven't attempted a query yet, I'm not sure how I'd start. – Pedro Carvalho Jan 22 '16 at 13:40
  • see this sql http://stackoverflow.com/questions/7074750/sql-query-help-10-records-for-each-distinct-column-value – developerCK Jan 22 '16 at 13:41
  • While this can certainly be done with a stored procedure, there seems to be no benefit of doing it this way. I can think of a way to do it with just one subquery in PostreSQL, but only because it supports arrays and has the `array_agg` function. – coladict Jan 22 '16 at 13:47
  • Nothing similar on MySQL? @developerCK: I'm not entirely sure how to adapt that question's query to my case. – Pedro Carvalho Jan 22 '16 at 15:06
  • Luckily, each company has only 4 transactions, so SELECT * FROM my_table will suffice – Strawberry Jan 22 '16 at 15:25

3 Answers3

1
DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,company_id VARCHAR(12) NOT NULL
,created_at DATE NOT NULL
);

INSERT INTO my_table VALUES
( 1123,'abcd','2015/10/12'),
( 8291,'abcd','2015/10/14'),
( 9012,'abcd','2015/10/15'),
( 9540,'abcd','2015/10/16'),
(10342,'abcd','2015/10/21'),
(10456,'abcd','2015/10/22'),
( 2301,'efgh','2015/10/13'),
( 4000,'efgh','2015/11/01'),
( 4023,'efgh','2015/11/03'),
( 6239,'efgh','2015/11/08'),
( 7500,'efgh','2015/11/14');

SELECT x.* 
  FROM my_table x 
  JOIN my_table y 
    ON y.company_id = x.company_id 
   AND y.created_at <= x.created_at 
 GROUP 
    BY x.id 
HAVING COUNT(*) <= 4 
 ORDER 
    BY company_id
     , created_at;
+------+------------+------------+
| id   | company_id | created_at |
+------+------------+------------+
| 1123 | abcd       | 2015-10-12 |
| 8291 | abcd       | 2015-10-14 |
| 9012 | abcd       | 2015-10-15 |
| 9540 | abcd       | 2015-10-16 |
| 2301 | efgh       | 2015-10-13 |
| 4000 | efgh       | 2015-11-01 |
| 4023 | efgh       | 2015-11-03 |
| 6239 | efgh       | 2015-11-08 |
+------+------------+------------+

A solution with variables will be orders of magnitude faster, e.g...

SELECT a.id
     , a.company_id
     , a.created_at
  FROM 
     ( SELECT x.*
            , CASE WHEN @prev = x.company_id THEN @i:=@i+1 ELSE @i:=1 END i, @prev:=x.company_id prev 
         FROM my_table x
            , (SELECT @i:=1,@prev:=null) vars 
        ORDER 
           BY x.company_id
            , x.created_at
     ) a
 WHERE i <= 4;
Strawberry
  • 33,750
  • 13
  • 40
  • 57
0

One possible way is the following:

select company_id, 
    min(created_at) as created_at_1,
    (select created_at from t where company_id=t1.company_id order by created_at limit 1 offset 1) as created_at_2,
    (select created_at from t where company_id=t1.company_id order by created_at limit 1 offset 2) as created_at_3,
    (select created_at from t where company_id=t1.company_id order by created_at limit 1 offset 3) as created_at_4
from t as t1
group by company_id

EDIT:

Another possibility (inspired by this answer) is:

select company_id, 
    min(created_at) as created_at_1,
    min(case r when 2 then created_at else null end) as created_at_2,
    min(case r when 3 then created_at else null end) as created_at_3,
    min(case r when 4 then created_at else null end) as created_at_4
from (
    select company_id, created_at,
        (case company_id when @curType 
        then @curRank := @curRank + 1
        else @curRank := 1 and @curType := company_id end)+1 as r
    from t, (select @curRank := 0, @curType := '') f
    order by company_id, created_at
) as o
where r <= 4
group by company_id
Community
  • 1
  • 1
-1

Could by like this maybe?

   SELECT S.company_id,
          A.created_at created_at_1,
          B.created_at created_at_2,
          C.created_at created_at_3,
          D.created_at created_at_4
     FROM sample S
LEFT JOIN sample A on S.company_id = A.company_id AND A.id NOT IN(S.id)
LEFT JOIN sample B on S.company_id = B.company_id AND B.id NOT IN(S.id, A.id)
LEFT JOIN sample C on S.company_id = C.company_id AND C.id NOT IN(S.id, A.id, B.id)
LEFT JOIN sample D on S.company_id = D.company_id AND D.id NOT IN(S.id, A.id, B.id, C.id)
 GROUP BY S.company_id

http://sqlfiddle.com/#!9/c577e/3

It might not be very efficient, though.

And they are not in order, because your American date format is not good to be sorted. Better switch to TIMESTAMP format.

yunzen
  • 32,854
  • 11
  • 73
  • 106