0

I have a table TRANSACTIONS with almost 30 million transactions (13 COLUMNS). How Can I optimize following code? I tried with self join but it seemed to be less effective.

Logic: I want to get last transactions by sender-receiver_2 if receiver_2 exists, else by sender-receiver + calculate some statistics (10/30/90 days)

SELECT T.* FROM
(SELECT T.*, row_number() over (partition by T.SENDER, (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END) order by T.DATE_ACCEPT desc) as seqnum 
FROM 
(
SELECT T.*
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) CNT_10
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_30
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_90 
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
             AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_10
      ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_30
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_90
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) MAX_AMOUNT_10
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_30
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_90
FROM TRANSACTIONS T
) T ) T
WHERE T.SEQNUM = 1

Also I created index on (SENDER, DATE_ACCEPT).

Query plan

TABLE EXAMPLE

Sun Moon
  • 1
  • 1
  • Please add the query plan with the index – Radim Bača Jan 24 '22 at 07:41
  • You probably have copy-paste errors in your SQL: you always have `WHEN T.RECEIVER_2 IS NULL THEN` in subqueries, however, `WHEN T2.RECEIVER_2 IS NULL THEN` should be used as well. – Radim Bača Jan 24 '22 at 07:46
  • Can you please provide desired result (based on your sample data)? Please provide table data as table, not as screenshot, see https://meta.stackoverflow.com/q/285551 – Wernfried Domscheit Jan 24 '22 at 10:05

4 Answers4

1

Are you aware of the Analytic Functions Windowing Clause?

I don't get the logic of your query, but I guess it might be possible without any self-joins. Have a look this query, it could be a starting point:

SELECT 
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) AS CNT_10,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS CNT_30,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS CNT_90,
    AVG(NVL(T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS AVG_30,
    AVG(NVL2(T.RECEIVER_2, T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS AVG_90
FROM TRANSACTIONS

Note, RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) is equal to RANGE INTERVAL '10' DAY PRECEDING)

Another note, when I run your query on the sample data, then I get

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|ID_TRAN|SENDER|RECEIVER|RECEIVER_2|AMOUNT|AMOUNT_2|DATE_ACCEPT        |CNT_10|CNT_30|CNT_90|AVG_AMOUNT_10|AVG_AMOUNT_30|AVG_AMOUNT_90|MAX_AMOUNT_10|MAX_AMOUNT_30|MAX_AMOUNT_90|SEQNUM|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1      |00010 |22222   |1112      |3000  |1000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|1      |00010 |22222   |2114      |3000  |2000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|2      |01236 |45872   |          |4000  |        |01.04.2021 22:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|3      |45872 |00010   |          |5000  |        |17.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

which looks quite pointless.

Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110
  • I tried to use analytical functions, but the averages didn’t work out correctly, I’ll try according to your example. And the example table was just a piece for understanding the data in the table, not for checking the calculation – Sun Moon Jan 24 '22 at 11:07
  • "calculate some statistics (10/30/90 days)" is not really precise. But the windowing function is actually intended for such calculations. – Wernfried Domscheit Jan 24 '22 at 11:30
  • This is a nice usage of *windowing_clause* (not sure who and why *dw*), unfortunately it fails with `ORA-30487` for `count distinct` calculation. But count distinct could be an *overkill* if the transaction table contains *unique* `ID_TRANS` @SunMoon – Marmite Bomber Jan 24 '22 at 13:57
  • The OP's example actualy implements not `AND CURRENT ROW ` but `AND CURRENT ROW EXCLUDE TIES` (as he consideres only transactions with *lover* date than the *last* transaction). This is the reason of the `null`s in the result on sample data. Not sure if it is intention or only *as implemented*. [`EXCLUDE TIES`](https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Analytic-Functions.html#GUID-527832F7-63C0-4445-8C16-307FA5084056) requires 21c. @SunMoon see also the related note in my answer. – Marmite Bomber Jan 24 '22 at 14:32
  • @MarmiteBomber, Using `RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND INTERVAL '0.001' SECOND PRECEDING` should also work. – Wernfried Domscheit Jan 24 '22 at 17:36
0

The major problem in your query is the CASE in predicates. It invalidates the usage of any index. Therefore you need to use a virtual column:

ALTER TABLE Transactions ADD rec AS (
     CASE WHEN RECEIVER_2 IS NULL 
     THEN RECEIVER ELSE RECEIVER_2 END
);

The second step is to create an index with this column:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept)

However, the index may not be used due to the query syntax. Replace the CASE syntax with the newly created column rec and also rewrite the greatest per group solution into a self-join. I add the reduced SQL example of how to do it.

select t.*,
    (
           select count(DISTINCT T2.id_tran)
           from transactions T2
           where T2.date_accept > T.date_accept - 10
                 AND T2.date_accept < T.date_accept
                 AND T2.rec = T.rec
                 AND T2.sender = T.sender
    ) CNT_10
from (
    select sender, rec, max(date_accept)
    from transactions
    group sender, rec
) tmax 
join transactions t on t.sender = tmax.sender and
                       t.rec = tmax.rec and
                       t.date_accept = tmax.date_accept

And if you want your statistical subqueries super fast, than add also other columns used in them:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept, id_tran, amount)
Radim Bača
  • 10,646
  • 1
  • 19
  • 33
  • The problem is receiver_2 can be in the position receiver. I need to separate them, cause sum will differ noticable. Receiver - organization director, receiver_2 - employees – Sun Moon Jan 24 '22 at 08:42
  • @SunMoon I understand that you need to separate them. My proposal includes a virtual column, that simplifies queries and enables index usage. I do not propose getting rid of `receiver_2` and `receiver` attributes. – Radim Bača Jan 24 '22 at 09:05
  • I got it. Thank you I'll try – Sun Moon Jan 24 '22 at 09:11
0

Having an index on (SENDER, DATE_ACCEPT) will probably help.

And you can simplify & accelarate the query by using one LATERAL JOIN with conditional aggregation.
It allows to calculate more than 1 COUNT/AVG/MAX.

For example:

SELECT T.*, LT.*
FROM (
  SELECT SENDER
    , RECEIVER, RECEIVER_2
    , DATE_ACCEPT
    , AMOUNT, AMOUNT_2 
  FROM (
    SELECT SENDER
    , RECEIVER, RECEIVER_2
    , DATE_ACCEPT
    , AMOUNT, AMOUNT_2 
    , ROW_NUMBER() OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT DESC) AS RN
    FROM TRANSACTIONS
  ) TRANS
  WHERE RN = 1
) T
CROSS JOIN LATERAL (
  SELECT 
    COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_10
  , COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_30
  , COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_90
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_10
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_30
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_90
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_10
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_30
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_90
  FROM TRANSACTIONS T2
  WHERE T2.SENDER = T.SENDER 
    AND T2.DATE_ACCEPT > T.DATE_ACCEPT - 90
    AND NVL(T2.RECEIVER_2, T2.RECEIVER) = NVL(T.RECEIVER_2, T.RECEIVER)
) LT;
SENDER RECEIVER RECEIVER_2 DATE_ACCEPT AMOUNT AMOUNT_2 CNT_10 CNT_30 CNT_90 AVG_AMOUNT_10 AVG_AMOUNT_30 AVG_AMOUNT_90 MAX_AMOUNT_10 MAX_AMOUNT_30 MAX_AMOUNT_90
1 2 3 30-MAR-21 10 20 1 2 3 11.2 21.65 45.5 11.2 32.1 93.2

Demo on db<>fiddle here

LukStorms
  • 28,916
  • 5
  • 31
  • 45
0

Your logic with receiver / receiver2 makes it only confusing, it is not the cause of your performance problems.

You'll get the same problems with a simple models with SENDER, RECEIVER, AMOUNT and DATE_ACCEPT which I'm using in the example - adapt for your purpose.

First you should realize what is the cause of the problem.

You are joining a large transaction table with it's history, producing a big result only to aggregate it and calculate the aggregated measures.

The key idea is to aggregate first and join back to the transaction table in the second step.

The query below calculates first the max_date_accept for each sender / receiver to calculate the aggregate measures in the next step using the history window (example for window 10 day - adapt as required).

Note that I copy your logic of ignoring the last transaction in the calculation by adding the predicate DATE_ACCEPT < max_date_accept.

This leads to the result on NULL if there is only one transaction in the calculated time interval, which is probably not what you want.

with trans as (
select 
 ID_TRAN, SENDER,  RECEIVER, AMOUNT,  DATE_ACCEPT,
 max(DATE_ACCEPT) over (partition by T.SENDER,  T.RECEIVER) max_date_accept
from TRANSACTIONS t
)
select 
  SENDER, RECEIVER,
  count(distinct case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then ID_TRAN end) CNT_10,
  avg(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) AVG_AMOUNT_10,
  max(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) MAX_AMOUNT_10
from trans  
group by SENDER, RECEIVER; 

Could be that this result is already what you want, but if you realy want the complete set of the columns from the transaction table with the values of the first transaction, simple join the aggregated result to the transaction table:

with trans as (
select 
 ID_TRAN, SENDER,  RECEIVER, AMOUNT,  DATE_ACCEPT,
 max(DATE_ACCEPT) over (partition by T.SENDER,  T.RECEIVER) max_date_accept
from TRANSACTIONS t
),
agg as (
select 
  SENDER, RECEIVER,
  count(distinct case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then ID_TRAN end) CNT_10,
  avg(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) AVG_AMOUNT_10,
  max(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) MAX_AMOUNT_10
from trans  
group by SENDER, RECEIVER),
trans2 as (
select
 t.*,
 row_number() over (partition by SENDER, RECEIVER order by DATE_ACCEPT desc) as seqnum
from TRANSACTIONS t)
select
 trans2.*,
 agg.CNT_10, agg.AVG_AMOUNT_10, agg.MAX_AMOUNT_10
from trans2
join agg on trans2.SENDER = agg.SENDER and trans2.RECEIVER = agg.RECEIVER
where seqnum = 1; 

Performance note - check the execution plan of the query.

You should see only TABLE ACCESS FULL and HASH JOIN. Query of your type get often problems if they use NESTED LOOPS or FILTER join with INDEX ACCESS.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53