0

I have searched around and didn't find the solution. (A useful reading SQL to find time elapsed from multiple overlapping intervals)

Here is my data: the rules are

"For each country, select the Date_ID in sequence with time interval equal or greater than 5 months"

My working environment is ORACLE SQL.

Thanks a lot.

Country       Date_ID
----------------------
USA           199003
USA           200004
USA           200005
USA           200009
USA           200010
UK            199307
UK            199308
UK            199408

Therefore the output should be

 Country    Date_ID
 --------------------
 USA        199003
 USA        200004
 USA        200009
 UK         199307
 UK         199408
user1503
  • 70
  • 8
  • 1
    Hi Jarlh, thanks for the comment. I am not sure if my question is confusing. Actually I am trying to select the Date "In sequential". Take USA an example, 1990/03 and 2000/04 time interval is greater than 5 months. However, 2000/05 to 2000/04 is not within 5 moths, therefore 2000/05 is not selected. – user1503 Nov 27 '18 at 20:19
  • 1
    What is your Oracle version? The problem is easy to solve in Oracle 12.1 and higher, with `match_recognize`; harder in older versions. –  Nov 27 '18 at 20:19
  • Also, what is the data type of `DATE_ID`? Are they *strings* or are they in the proper data type, which is `date`? –  Nov 27 '18 at 20:20
  • @mathguy, you're right... – jarlh Nov 27 '18 at 20:24
  • @mathguy My oracle is 11.2 DATE_ID is NUMBER(6). If you know some solution in this version, that is great. If not, any suggestion/document in Oracle 12.1 or Mysql.etc is really appreciated. – user1503 Nov 27 '18 at 20:27
  • So then, there are no forward slashes in DATE_ID? You shouldn't show them in your sample data (and sample desired result). –  Nov 27 '18 at 20:29
  • @mathguy sorry for the confusion. I just clean the input data. – user1503 Nov 27 '18 at 20:32
  • Why would 200009 be included? It is 4 months ahead of 200005. How does that fit the rule? – Gordon Linoff Nov 27 '18 at 20:53
  • @GordonLinoff - I already explained how that fits the rules, in a comment to your answer. –  Nov 27 '18 at 20:56

1 Answers1

1

Here is one way to solve this, which will work at least as far back as Oracle 10.2. It uses analytic functions and a hierarchical query.

The WITH clause is there just to build the sample data on the fly. You don't need it - remove it, and use your actual table and column names in the query. (In the WITH clause I declared the columns after the CTE name, which works only in Oracle 11.2 and higher, but the WITH clause is not part of the solution, so I wouldn't worry about that.)

with
  sample_data (country, date_id) as (
    select 'USA', 199003 from dual union all
    select 'USA', 200004 from dual union all
    select 'USA', 200005 from dual union all
    select 'USA', 200009 from dual union all
    select 'USA', 200010 from dual union all
    select 'UK' , 199307 from dual union all
    select 'UK' , 199308 from dual union all
    select 'UK' , 199408 from dual
  )
select country, date_id
from   (
         select country, date_id,
                row_number() over (partition by country order by dt) as rn,
                count(*) over (partition by country order by dt
                  range between current row 
                            and interval '4' month following) as ct
         from   (
                  select country, date_id, 
                         to_date(to_char(date_id, 'fm999999'), 'yyyymm') as dt
                  from   sample_data
                )
       )
start with rn = 1
connect by country = prior country and rn = prior rn + prior ct
;

COUNTRY    DATE_ID
------- ----------
UK          199307
UK          199408
USA         199003
USA         200004
USA         200009

For comparison, here is a match_recognize solution, which requires Oracle 12.1 or higher:

select country, date_id
from   (
         select country, date_id, 
                to_date(to_char(date_id, 'fm999999'), 'yyyymm') dt
         from   sample_data
       )
match_recognize(
  partition by country
  order by     date_id
  all rows per match
  pattern      (a {- b* -})
  define b as  dt < add_months(a.dt, 5)
);
  • This is exactly I am looking for and very smart answer. I test it and it works! I need some time to digest it and will keep you updated. thank you! – user1503 Nov 27 '18 at 21:24
  • @user1503 - feel free to ask follow-up questions as you study the solution. –  Nov 27 '18 at 21:33
  • Hi Mathguy, Thanks a lot for this answer and I picked up a lot of useful function tools. However, I recently run into a problem which seems not solvable by match recognition. It seems sliding windows is the way to go. I posted the question https://stackoverflow.com/questions/55738319/extract-pair-of-rows-with-time-logic It seems my action here is not appropriate to ask you another question. If so, I will delete this msg. Thank you as always. – user1503 Apr 18 '19 at 14:56