0

I'm Using Amazon Redshift. I need to get the MAX Date in the column by Month wise. The Example is as below.

There are 5 tables:

vendor
vendor_pkg
vendor_pkg_category
vendor_load
vendor_load_status

vendor  V

vendor_id   vendor_name
-----------------------
1            L&T
2            Reuters
3            IBM
4            INfosys

vendor_pkg  VP

vendor_pkg_id  vendor_pkg_category_id   vendor_pkg_name  vendor_id
------------------------------------------------------------------
1              1                        Futures          1
2              1                        Fairvalue        1
3              3                        Equities         1
4              2                        MBS              1
5              2                        INTL Price       2
6              4                        Muni             2

vendor_pkg_category  VPC

vendor_pkg_category_id  category_name
-------------------------------------
1                       Price
2                       Security
3                       Rating
4                       value

Vendor_load  VL

vendor_load_id  eval_date   load_status_id  vendor_pkg_id
---------------------------------------------------------
1               2014-06-05  1               1
2               2014-06-20  1               1
3               2014-07-05  2               2
4               2014-07-20  1               2
5               2014-06-05  2               3
6               2014-06-20  2               3
7               2014-07-05  1               4
8               2014-07-20  2               4

vendor_load_status  VLS

load_status_id  load_status_name
--------------------------------
1               Success
2               Failed

Result table should be like this:

v.vendor  vpc.category_name  vp.ven_pkg_name  vl.eval_date  vls.status_name
---------------------------------------------------------------------------
L&T       Price              futures          2014-06-20    Success
L&T       Price              fairvalue        2014-07-20    Success
L&T       Security           MBS              2014-07-20    Failed
L&T       Rating             Equities         2014-06-20    Failed

I use the following query. But it displays the data for one month only:

SELECT DISTINCT v.vendor_name AS vendor,
       vpc.category_name AS V_Type,
       vp.vendor_pkg_name AS Package_name,
       vl.eval_date AS C_Date,
       vls.load_status_name AS Status
FROM ces_idw.vendor v,
     ces_idw.vendor_pkg_category vpc,
     ces_idw.vendor_load vl,
     ces_idw.vendor_pkg vp,
     ces_idw.vendor_load_status vls
WHERE (vl.eval_date) IN (SELECT DISTINCT MAX(vl.eval_date)
                         FROM ces_idw.vendor_load vl
                         WHERE v.vendor_id = vp.vendor_id
                         and v.vendor_name = 'IDC'
                         AND   vp.vendor_pkg_id = vl.vendor_pkg_id
                         AND   TO_CHAR(vl.eval_date,'yyyy-mm') = '2014-06'
                         GROUP BY vl.vendor_pkg_id,
                                  v.vendor_name)                               
AND   vp.vendor_pkg_category_id = vpc.vendor_pkg_category_id
AND   vp.vendor_pkg_id = vl.vendor_pkg_id
AND   vl.load_status_id = vls.load_status_id
ORDER BY vp.vendor_pkg_name

when I use TO_CHAR(vl.eval_date,'yyyy-mm')between '2014-06' and '2014-07' it shows the result for '2014-07'.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Benny
  • 432
  • 1
  • 6
  • 21

3 Answers3

1

As per your sample data i wrote query this gives you mentioned result set

DECLARE @exp table (ID INT,Name VARCHAR(10))
INSERT INTO @exp (ID,Name) VALUES (1,'PRICE')
INSERT INTO @exp (ID,Name) VALUES (2,'STOCK')
INSERT INTO @exp (ID,Name) VALUES (3,'INCOME')
INSERT INTO @exp (ID,Name) VALUES (4,'LOAD')
INSERT INTO @exp (ID,Name) VALUES (5,'INITIAL')

DECLARE @exp1 table (ID INT,PID INT,Name VARCHAR(10),Dated Date)
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (1,1,'PRICE','2014-08-05')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (2,1,'PRICE','2014-08-09')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (3,2,'STOCK','2014-08-05')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (4,2,'STOCK','2014-08-05')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (5,3,'INCOME','2014-08-10')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (6,3,'INCOME','2014-08-20')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (7,4,'LOAD','2014-08-10')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (8,4,'LOAD','2014-08-19')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (9,5,'INITIAL','2014-08-05')
INSERT INTO @exp1 (ID,PID,Name,Dated) VALUES (10,5,'INITIAL','2014-08-05')

SELECT DISTINCT groupedtt.ID,groupedtt.PID,tt.Name,groupedtt.MaxDateTime
FROM @exp tt
INNER JOIN
    (SELECT ID,PId, MAX(dated) AS MaxDateTime,DENSE_RANK()OVER (PARTITION BY PID ORDER BY ID )RN
    FROM @exp1
    GROUP BY PId,ID) groupedtt 
ON tt.id = groupedtt.PId AND 
RN = 2
mohan111
  • 8,633
  • 4
  • 28
  • 55
  • one doubt. if i retrieve data from one more table will it retrieve. eg the link between t1 and t2 is by t3. – Benny Jul 17 '14 at 14:40
  • yeah we can write another inner join for another table and can get the required output and lot more depends on table structure @benpep – mohan111 Jul 17 '14 at 15:19
  • 1
    That's not valid syntax for Postgres –  Jul 19 '14 at 09:24
1

I Found the Answer for my Question.

SELECT DISTINCT v.vendor_name AS vendor,
       vpc.category_name AS V_Type,
       vp.vendor_pkg_name AS Package_name,
       vl.eval_date AS C_Date,
       vls.load_status_name AS Status
FROM ces_idw.vendor v,
     ces_idw.vendor_pkg_category vpc,
     ces_idw.vendor_load vl,
     ces_idw.vendor_pkg vp,
     ces_idw.vendor_load_status vls
WHERE  (vl.eval_date) IN (
                         SELECT DISTINCT MAX(vl.eval_date)
                         FROM  ces_idw.vendor_load vl
                         WHERE v.vendor_id = vp.vendor_id
                         AND   v.vendor_name = 'L&T'
                         AND   vp.vendor_pkg_id = vl.vendor_pkg_id
                         AND  (TO_CHAR(vl.eval_date,'yyyy-mm') between '2013-01' and '2015-12')
                         GROUP BY extract(month from vl.eval_date),vl.vendor_pkg_id, v.vendor_name
                         ) 
AND   vp.vendor_pkg_category_id = vpc.vendor_pkg_category_id
AND   vp.vendor_pkg_id = vl.vendor_pkg_id
AND   vl.load_status_id = vls.load_status_id
ORDER BY vp.vendor_pkg_name

Thanks To Everyone

Benny
  • 432
  • 1
  • 6
  • 21
1

Your currently accepted solution doesn't seem right.
My educated guess is that you want:
Details for the latest row per month for every product of a given vendor.

SELECT DISTINCT ON (v.vendor_id, vl.vendor_pkg_id
                  , date_trunc('month', vl.eval_date))
       v.vendor_name        AS vendor
     , vpc.category_name    AS v_type
     , vp.vendor_pkg_name   AS package_name
     , vl.eval_date         AS c_date
     , vls.load_status_name AS status
FROM   ces_idw.vendor              v
JOIN   ces_idw.vendor_pkg          vp  USING (vendor_id)
JOIN   ces_idw.vendor_load         vl  USING (vendor_pkg_id)
JOIN   ces_idw.vendor_load_status  vls USING (load_status_id)
JOIN   ces_idw.vendor_pkg_category vpc USING (vendor_pkg_category_id)
WHERE  v.vendor_name = 'L&T'
AND    vl.eval_date BETWEEN '2013-01-01' AND '2015-12-31'
ORDER  BY v.vendor_id, vl.vendor_pkg_id
        , date_trunc('month', vl.eval_date), vl.eval_date DESC;

Major points

  • Returns what I described. Much simpler and faster than your current solution. And probably correct.

  • Use explicit JOIN syntax, much clearer. Even simpler with the USING clause, which your naming convention conveniently allows.

  • Use DISTINCT ON, available since .. forever in Postgres.
    date_trunc() instead of extract() keeps all months separated. Wouldn't make much sense to group months of multiple years together.
    The added item vl.eval_date DESC in the ORDER BY clause picks the latest row per month.
    Detailed explanation for DISTINCT ON:
    Select first row in each GROUP BY group?

  • Since it is unclear whether vendor_name is unique, I included vendor_id in the DISTINCT ON and ORDER BY clauses accordingly.

  • Do not transform your date column to text in the WHERE clause, this is expensive nonsense and makes it impossible to use simple indexes. The resulting expression is not sargable.

  • Do not use the multiply twisted and expensive subquery, connected IN. Completely replaced by DISTINCT ON.
    In particular, combining DISTINCT with MAX(vl.eval_date) doesn't make sense.

  • To make this fast you only need indexes on (Vendor_load.eval_date) and (vendor.vendor_name) - in addition to the obvious primary keys and indexes on foreign key columns.

All links to the manual for Postgres 8.0.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for the answer But DISTINCT ON is not working in my PostgreSQL. That's Why i Wrote the code which i post. @Erwin Brandstetter – Benny Jul 24 '14 at 05:08
  • @benpep: I suggest you finally declare what you are working with - like you should have done to begin with. Is it Redshift? What do you get from `SELECT version()`. Edit the question with this basic information. – Erwin Brandstetter Jul 24 '14 at 15:09
  • yes. its my mistake. it shows. PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 (Red Hat 3.4.2-6.fc3), Redshift 1.0.797 – Benny Jul 25 '14 at 05:23