-2

Have a table like:

ID.   | Start  |  End  | Item \
Kate    1-Jan   31-Jan  Apple\
Kate    1-Feb   28-Feb  Apple\
Kate    1-Mar   15-Mar  Apple\
John    1-Jan   31-Jan  Apple\
John    1-Feb   28-Feb  Apple\
John    1-Mar   31-Mar  Apple\
Kate    16-Mar  31-Mar  Banana\
Kate    1-Apr   30-Apr  Banana\
Kate    1-May   31-May  Apple\
Kate    1-Jun   15-Jun  Apple

The output wanted is below:\

ID   |  Start |  End   | Item\
Kate    1-Jan   15-Mar  Apple\
John    1-Jan   31-Mar  Apple\
Kate    16-Mar  30-Apr  Banana\
Kate    1-May   15-Jun  Apple

I need to compress the dataset by combine the records start and end date if the customer buys the same item. In my case, if a custom switch from one item and then switch back again, these two period cannot be combined. I have tried "group by ID, Item" clause in sql and it doesn't work for my case as the Kate Apple records will be combined together for one period from 1-Jan to 15-Jun, which is overlapping the Kate Banana period.

Any help will be greatly appreciated!

Ajay A
  • 1,030
  • 1
  • 7
  • 19
PZ_
  • 1
  • Are the `start` and `end` data value is like that? Not in `date` format? – FanoFN Sep 24 '20 at 01:01
  • Please post the errors/results you have got after trying the things, you just can't expect an answer for your questions with out showing any research/efforts – Ajay A Sep 24 '20 at 05:44
  • See https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad – Strawberry Sep 24 '20 at 07:25

1 Answers1

1

Yes, you can combine the rows. Here is one method:

select id, item, min(start), max(end)
from (select t.*,
             sum(case when prev_end >= start - interval 1 day then 0 else 1 end) over
                 (partition by id, item order by start) as grp
      from (select t.*,
                   lag(end) over (partition by id, item order by start) as prev_end
            from t
           ) t
     ) t
group by id, item, grp;

The innermost subquery the previous end time for each row. The middle subquery then takes a cumulative sum of when the previous end "chains" to the current row. This assigns a group to adjacent rows with the same id and item values.

The outer query then aggregates to get the final result.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786