1

I am trying to analyse a bunch of transaction data and have set up a series of different ranks to help me. The one I can't get right is the beneficiary rank. I want it to partition where there is a change in beneficiary chronologically rather than alphabetically.

Where the same beneficiary is paid from January to March and then again in June I would like the June to be classed a separate 'session'.

I am using Teradata SQL if that makes a difference.

I thought the solution was going to be a DENSE_RANK but if I PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate it counts up the number of months. If I PARTITION BY (CustomerID) ORDER BY Beneficiary then it is not chronological, I need the highest rank to be the latest Beneficiary.

SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
  ,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
  ,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
  ,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
  ,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID  Beneficiary Amount  DateStamp   Month   PaymentRank MonthRank   BeneficiaryRank TransactionRank
a   aa  10      Jan 1   1   1   1
a   aa  20      Feb 2   2   2   1
a   aa  20      Mar 3   3   3   2
a   aa  20      Apr 4   4   4   3
a   bb  20      May 5   5   1   1
a   bb  30      Jun 6   6   2   1
a   aa  30      Jul 7   7   5   2
a   aa  30      Aug 8   8   6   1
a   cc  5       Sep 9   9   1   1
a   cc  5       Oct 10  10  2   2
a   cc  5       Nov 11  11  3   3
b   cc  5       Dec 1   1   1   1

This is what I have so far, I want a column alongside this which will look like the below

CustomerID  Beneficiary Amount  DateStamp   Month   NewRank
a   aa  10      Jan 1
a   aa  20      Feb 1
a   aa  20      Mar 1
a   aa  20      Apr 1
a   bb  20      May 2
a   bb  30      Jun 2
a   aa  30      Jul 3
a   aa  30      Aug 3
a   cc  5       Sep 4
a   cc  5       Oct 4
a   cc  5       Nov 4
b   cc  5       Dec 1
ejderuby
  • 710
  • 5
  • 21
  • Thanks Gordon and Dnoeth for your support with this...unfortunately I still can't get over the last hurdle here. I have used @dnoeth's LAG workaround to get me the flag and that is showing correctly whenever there is a beneficiary change. It is the cumulative SUM which I still cannot get to work. `SELECT a.* ,SUM(flag) OVER(PARTITION BY CusId ORDER BY PaymentRank ASC) AS NewRank FROM (.....` My NewRank field is just a static SUM of the flag rather than a cumulative count. I have used PaymentRank as opposed to SystemDate as there are duplicate dates which I thought might be causing it. – Benet McManus Aug 30 '19 at 12:38
  • ...and apologies to @Gordon and dnoeth for the late response. – Benet McManus Aug 30 '19 at 12:39
  • I didn't get your response, probably due to the `'s` following `@dnoeth` :-) – dnoeth Sep 02 '19 at 10:39

3 Answers3

0

This is a type of gaps-and-islands problem. I would recommend lag() and a cumulative sum:

select t.*,
       sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)            
from (select t.*,
             lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
      from t
     ) t
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks @Gordon-Linoff, I am out of time today and keep getting a syntax error: Data Type "System_Date" does not match a Defined Type Name. I will have another go at wrapping my head around it tomorrow. – Benet McManus Aug 22 '19 at 16:47
  • @BenetMcManus . . . Your question has `SystemDate`. My answer has `systemdate` (it was misspelled in one place). There is no hyphen in the name. – Gordon Linoff Aug 22 '19 at 16:55
0
SELECT dt.*,
   -- now do a Cumulative Sum over those 0/1
   SUM(flag)
   OVER(PARTITION BY CustomerID
        ORDER BY SystemDate ASC
                ,flag DESC -- needed if the order by columns are not unique
        ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
 ( 
    SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
      ,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
      ,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
      ,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
      ,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
      -- assign a 0 if current & previous Beneficiary are the same, otherwise 1 
      ,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
    FROM table 
 ) AS dt
ORDER BY CustomerID, PaymentRank

Your problem with Gordon's query is probably caused by your Teradata release, LAG is only supported in 16.10+. But there's a simple workaround:

LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)

--is equivalent to 
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
                      ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))
dnoeth
  • 59,503
  • 4
  • 39
  • 56
0

Credits to @Gordon and @dnoeth for providing the ideas and code to get me on the right track.

The below is mostly ripped from dnoeth but needed to add ROWS unbounded preceding to get the aggregation correct. Without this it was just showing the total for the partition. I also changed the systemdate to paymentrank as I had to fiddle about a bit with duplicate entries on a day.

SELECT dt.*,
   -- now do a Cumulative Sum over those 0/1
   SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
 ( 
    SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
      -- assign a 0 if current & previous Beneficiary are the same, otherwise 1 
      ,CASE WHEN  Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC  ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)  THEN 0 ELSE 1 END  AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank

The inner query sets a flag whenever the beneficiary changes. The outer query then does a cumulative sum on those.

I was unsure what the unbounded preceding was doing and @dnoeth has a great explanation here Below is taken from that explanation.

•UNBOUNDED PRECEDING, all rows before the current row -> fixed

•UNBOUNDED FOLLOWING, all rows after the current row -> fixed

•x PRECEDING, x rows before the current row -> relative

•y FOLLOWING, y rows after the current row -> relative

Community
  • 1
  • 1
  • Sorry, I did a cut&paste from Gordon's answer and forgot to add the ROWS UNBOUNDED. In Standard SQL it's the default when ORDER BY is present, but in Teradata (due to historical reasons) you have to code it explicitly. – dnoeth Sep 02 '19 at 10:35
  • Another remark, if `PaymentRank` is not unique you might get different result when you rerun the Select. In that case add another column to Order By to make it unique and repeatable. – dnoeth Sep 02 '19 at 10:41
  • Ah, I made it unique by changing it from RANK to ROW but my understanding is that that would then be indiscriminate and therefore not repeatable. – Benet McManus Sep 02 '19 at 10:54
  • My remark was not fully correct, it's even worse, the Cumulative Sum might (and probably will) return a wrong result. You got `ORDER BY PaymentRank` in both OLAP-functions, but there's no guarantee that rows with the same rank are sorted the same twice. E.g. three rows with the ORDER BY value will always return flags 1,0,0 in the MIN but the 2nd Order might return a different order, e.g flags 0,1,0. Either add the same column to both Orders to make it unique or use `ORDER BY PaymentRank, flag desc` (prefered) in the Cumulative Sum. See my edited answer. – dnoeth Sep 02 '19 at 11:12
  • Thanks for this, I was still doing some testing on the results but have now implemented this fix with the ORDER BY ...,Flag DESC as suggested. Thanks again. – Benet McManus Sep 02 '19 at 13:54