1

Afternoon all,
I'm trying to pull together a query that will return financial overview information about our customers.
I have a financial_history table of about 10m rows that contains details of every transaction and I want to see each customers first, last and largest transactions.
The tricky bit (and, yes, this is in part a data quality issue and, yes it's very annoying) is that practically every single customer has more than one transaction on the date of their first/last and has several at the same maximum value. It's not entirely hopeless because each transaction has an ID number that resets daily so the date and ID forms a unique identifier and the transaction IDs are also sequential so, in the case of the earliest transaction, where there are multiple rows for a customer on the same date I want the one with the lowest transaction number
The other problem is that this needs to be run as a sproc job to populate a table with updated info every day so efficiency/runtime is an issue.
Reading through previous questions and answers here I've got as far as a query that runs reasonably quickly but I can't figure out how to incorporate the transaction ID element to select only the single relevant row.
For the largest transaction, for example, the query looks like this

select
fh.contact_number
,fh.maxamt
,fh2.transaction_date
from
(select
#fh.contact_number
,MAX(#fh.amount) maxamt
from #fh
group by
#fh.contact_number) fh
INNER JOIN #fh as fh2
ON fh2.amount = fh.maxamt and fh2.contact_number = fh.contact_number;

and returns information like this

| contact_number | maxamt |     transaction_date    |
|:--------------:|:------:|:-----------------------:|
|        1       |   100  | 2010-06-01 00:00:00.000 |
|        1       |   100  | 2012-06-01 00:00:00.000 |
|        1       |   100  | 2013-06-03 00:00:00.000 |
|        2       |   500  | 2011-04-09 00:00:00.000 |
|        2       |   500  | 2013-11-21 00:00:00.000 |

the financial_history table looks like this

contact_number int
,transID int
,amount money
,transaction_date datetime

Any ideas how I can change this query to select only the row with the largest transID? I'm also open to completely different approaches but I've tried a lot of different things (most of them copied from other questions here) and this is by far and away the fastest running.
I will be immensely grateful for any help or suggestions Thanks!

Community
  • 1
  • 1
Liam
  • 45
  • 4

1 Answers1

0

This will return all of the transaction of the maximum amount. You can add additional criteria to amount_rank SORT BY clause if you want to reduce the number of transactions returned.

SELECT  *
FROM    ( SELECT    contact_number ,
                    transID ,
                    amount ,
                    transaction_date ,
                    RANK() OVER ( PARTITION BY contact_number ORDER BY amount DESC ) AS amount_rank ,
                    ROW_NUMBER() OVER ( PARTITION BY customer_number ORDER BY transaction_date ASC, transID ASC ) AS first_transaction ,
                    ROW_NUMBER() OVER ( PARTITION BY customer_number ORDER BY transaction_date DESC, transID DESC ) AS last_transaction
          FROM      #fh
        ) fh
WHERE   ( fh.amount_rank = 1
          OR fh.first_transaction = 1
          OR fh.last_transaction = 1
        )
Jim V.
  • 2,137
  • 16
  • 14