1

Assume you have a table named tracker with following records.

issue_id  |  ingest_date         |  verb,status
10         2015-01-24 00:00:00    1,1
10         2015-01-25 00:00:00    2,2
10         2015-01-26 00:00:00    2,3
10         2015-01-27 00:00:00    3,4
11         2015-01-10 00:00:00    1,3
11         2015-01-11 00:00:00    2,4

I need the following results

10         2015-01-26 00:00:00    2,3
11         2015-01-11 00:00:00    2,4

I am trying out this query

select * 
from etl_change_fact 
where ingest_date = (select max(ingest_date) 
                     from etl_change_fact);

However, this gives me only

10    2015-01-26 00:00:00    2,3

this record.

But, I want all unique records(change_id) with

(a) max(ingest_date) AND

(b) verb columns priority being (2 - First preferred ,1 - Second preferred ,3 - last preferred)

Hence, I need the following results

10    2015-01-26 00:00:00    2,3
11    2015-01-11 00:00:00    2,4

Please help me to efficiently query it.

P.S : I am not to index ingest_date because I am going to set it as "distribution key" in Distributed Computing setup. I am newbie to Data Warehouse and querying.

Hence, please help me with optimized way to hit my TB sized DB.

ramya
  • 275
  • 1
  • 5
  • 13
  • I don't understand the problem. max(ingest_date) is "2015-01-26 00:00:00", so condition a) will get you only that record. What does your condition (b) mean? Can you rephrase it, because I don't get how your description relates to the records you want returning. Finally - mysql or postgresql? – mlinth Feb 03 '15 at 11:26
  • I need for postgreSQL – ramya Feb 03 '15 at 12:01
  • @mlinth : Both my conditions are valid....! Please review my sample data... Comparitively if I take max(ingest_date) and the record has verb as 3, I dont want that record. I want the one previous to it(i mean which doesnt have verb as 3) – ramya Feb 03 '15 at 12:03

1 Answers1

1

This is a typical "greatest-n-per-group" problem. If you search for this tag here, you'll get plenty of solutions - including MySQL.

For Postgres the quickest way to do it is using distinct on (which is a Postgres proprietary extension to the SQL language)

select distinct on (issue_id) issue_id, ingest_date, verb, status
from etl_change_fact
order by issue_id, 
         case verb 
            when 2 then 1 
            when 1 then 2
            else 3
         end, ingest_date desc;

You can enhance your original query to use a co-related sub-query to achieve the same thing:

select f1.* 
from etl_change_fact f1
where f1.ingest_date = (select max(f2.ingest_date) 
                        from etl_change_fact f2
                        where f1.issue_id = f2.issue_id);

Edit

For an outdated and unsupported Postgres version, you can probably get away using something like this:

select f1.* 
from etl_change_fact f1
where f1.ingest_date = (select f2.ingest_date
                        from etl_change_fact f2
                        where f1.issue_id = f2.issue_id
                        order by case verb 
                                  when 2 then 1 
                                  when 1 then 2
                                  else 3
                              end, ingest_date desc
                        limit 1);

SQLFiddle example: http://sqlfiddle.com/#!15/3bb05/1

  • Thanku horse... In addition to ingest date condition, I would also like to add 'verb ' condition....plz help if u can... – ramya Feb 03 '15 at 12:06
  • ERROR: SELECT DISTINCT ON is not supported getting this error... My postgreSQL version is 8.0.2 – ramya Feb 03 '15 at 12:11
  • 1
    @user2893206: 8.0 has not been maintained for over 5 years and is 10 years old. The first thing you have to do is upgrade to a supported and current version (e.g. 9.4). –  Feb 03 '15 at 12:12
  • Sorry. I understand. I actually meant like "please help me if you have any idea" – ramya Feb 03 '15 at 12:15
  • Thanku. But, the above query is giving me all the records. – ramya Feb 03 '15 at 12:27
  • @ramya: Sorry, there was a typo in my query. –  Feb 03 '15 at 14:02