0

I have implemented CDC - SCD Type 2 on the customer data set.

I have implemented CDC on large of columns but the ask is to track behavior for only subset of those columns.

In the below input table I have ID column for the customer and RATE_CODE as one of the CDC field and START and END DATES are the CDC changes dates.

In this I wan't to know how over a period of time the customer data(RATE_CODE) is changing.

EX Row 1-3 has same RATE_CODE thus i need min(START_DATE) from ROW#1 and max(END_DATE) from ROW#3.

I tried applying group by on (ID,RATE_CODE) and min and max on dates but it is giving wrong value as in that case the max value will be picked from ROW#9 (for which I want a separate entry considering the ROW#6-9)

INPUT TABLE

ROW NUMBER ID RATE_CODE START_DATE END_DATE
1 1 A1 01-01-2021 18-01-2021
2 1 A1 18-01-2021 25-02-2021
3 1 A1 25-02-2021 15-03-2021
4 1 A2 15-03-2021 28-03-2021
5 1 A2 28-03-2021 28-05-2021
6 1 A1 28-05-2021 28-06-2021
7 1 A1 28-06-2021 12-07-2021
8 1 A1 20-07-2021 28-07-2021
9 1 A1 28-08-2021 13-09-2021
10 1 A2 13-09-2021 13-10-2021

EXPECTED OUTPUT

ID RATE_CODE START_DATE END_DATE
1 A1 01-01-2021 15-03-2021
1 A2 15-03-2021 28-05-2021
1 A1 28-05-2021 13-09-2021
1 A2 13-09-2021 13-10-2021

There could be some articles or answer on the net as well but due to framing of the question I couldn't find them.

I want the solution in SQL but for the community PySpark and other languages are also welcomed.

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • SELECT ID,RATE_CODE,MIN(START_DATE),MAX(END_DATE) FROM #TBL GROUP BY ID,RATE_CODE – Python Learner Aug 03 '21 at 08:00
  • 2
    This is known as a "gaps and island" problem. Lots of examples of this; I've added a couple as dupe candidates. – Thom A Aug 03 '21 at 08:01
  • @Larnu Thanks for the keyword. I knew it was a common scenario but didn't know how to search. In case you are aware can you help me with the keywords for other such scenario of SQL's any link or tutorials for that matter. – Python Learner Aug 03 '21 at 08:07
  • @PythonLearner google advanced sql queries. – Dale K Aug 03 '21 at 08:18
  • 1
    Don't worry about not know what the terminology is, @PythonLearner , we all didn't know what it was at one point. Once you do know what it is though, normally due to asking how to implement the solution for such terms without knowing what it is and finding out what it is in the answer (like here), it makes searching for the answer vastly easier. :) – Thom A Aug 03 '21 at 08:25
  • @Larnu . . . It is really sad that this is marked as a duplicate. For this particular variant, there is a better solution than in either of those two answers. But people here are so uptight about duplicate questions. – Gordon Linoff Aug 03 '21 at 11:17
  • *"But people here are so uptight about duplicate questions."* *Cough* With respect, @GordonLinoff, until you *learn* where the "close as duplicate" feature is, I don't feel you can weigh in here. – Thom A Aug 03 '21 at 11:19
  • Also, if you think your [duplicate](https://stackoverflow.com/a/38151729/2029983) isn't good enough any more, @GordonLinoff , then [edit](https://stackoverflow.com/posts/38151729/edit) it. You were [told this before](https://meta.stackoverflow.com/a/408957/2029983) by a huge mass of the community. Learn from your mistakes. – Thom A Aug 03 '21 at 11:26
  • @Larnu . . . Not at all. I don't think it is a duplicate. I think the constraints on this question are quite different. – Gordon Linoff Aug 03 '21 at 12:15
  • Again, with respect @GordonLinoff , your opinion of what isn't a duplicate has been proven to be quite misaligned with the community. If this didn't answer the question, I would have expected the OP to have let us know by now with an edit and a comment (directed at myself) explaining why it is not a duplicate. The fact that they have not, and actually responded quite well to the candidates suggests that the answer has been provided by said duplicate candidates. – Thom A Aug 03 '21 at 12:18
  • @Larnu . . . It is sad that someone asking a question here is not getting the best answer to their question. – Gordon Linoff Aug 03 '21 at 12:23
  • @GordonLinoff Thought the provided solution are not exactly the same as my case but are on the same line and Since they involve CTE (need to get idea about it as well) so I need some time to built a logic for my case. Although I also wanted the solution in PySpark for the learning purpose and for the community but couldn't find a way to undo the Duplicate(Not reposted the question because at least I now know on which lines to works) .Incase there are better solution they are most welcomed – Python Learner Aug 03 '21 at 13:13
  • @PythonLearner . . . I'm not saying they don't work. I'm saying there is a better answer. – Gordon Linoff Aug 03 '21 at 15:56
  • @GordonLinoff In that case it would be really helpful if you can share that as well. Different approach to the problem will help me in understand advanced SQL part better. – Python Learner Aug 03 '21 at 16:55
  • @PythonLearner . . . Unfortunately, the question was closed with an inappropriate duplicate before I had an opportunity to answer. – Gordon Linoff Aug 03 '21 at 23:57

0 Answers0