2

I am working with SQR to put together a report. I can't change the structure of the database, nor can I use PL/SQL to complete this task.

Since the report can be run from remote locations, I don't want to make multiple calls to the database from within SQR. My goal is to return everything in 1 SQL that includes only the records that I need to report to increase the run-time over a slow connection.

I have it working right now, but I'm concerned with the performance on the database.

The "transactions" table has the following fields that can be used for this purpose:

account_num number(10) -- the account number
seq_num number(10) -- not a real sequence, it is unique to account_num
check_num number(10) -- the number on the check
postdate date

The primary key is (account_num, seq_num)

Sample data looks like this:

account_num    seq_num  check_num   postdate
----------- ---------- ---------- ----------
          1         11        200 2014-07-13
          1         16        201 2014-07-14
          1         23        205 2014-07-15
          2         52        282 2014-07-13
          2         66        284 2014-07-14
          2         72        231 2014-07-15
          3         11        201 2014-07-13
          3         12        202 2014-07-14
          3         15        203 2014-07-15

Note: There are many other types of transactions in the table, but we are filtering the list on the type of transaction, which isn't very important for this question, so I left that out. The volume of transactions seems to average around 750,000 per month (for all transactions, not just checks), and out of that, on average, about 10,000 check transactions get reported.

The selection criteria is to return all check transactions occurring between two dates (inclusive - usually the first day of the month and the last day of the month) where the difference between any sorted check numbers for an account is greater than X (we will use 10 in this case).

Using the above sample data, the results look like this:

account_num    seq_num  check_num   postdate
----------- ---------- ---------- ----------
          2         52        282 2014-07-13
          2         66        284 2014-07-14
          2         72        231 2014-07-15

All checks from account_num 2 are returned because the difference between check_num 282 and 231 is greater than 10.

I built the following SQL to return the results above:

select
  t1.*
from
  transactions t1
join (
  select
    t3.account_num,
    t3.min_postdate,
    t3.max_postdate,
    max(t3.check_diff)
  from (
    select distinct
      t4.account_num,
      lead(t4.check_num, 1, t4.check_num) over (partition by t4.account_num order by t4.check_num) - t4.check_num as check_diff,
      min(t4.postdate) over (partition by t4.account_num) min_postdate,
      max(t4.postdate) over (partition by t4.account_num) max_postdate
    from
      transactions t4
    where
      t4.postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))) t3
  group by
    t3.account_num,
    t3.min_postdate,
    t3.max_postdate
  having max(t3.check_diff) > 10) t2
    on t1.account_num = t2.account_num
    and t1.postdate between t2.min_postdate and t2.max_postdate
;

I would like to return the seq_num of all the checks from t4 so I end up using the primary key on t1. I've tried using LISTAGG, which works to get the numbers together.

listagg(t4.seq_num,',') within group (order by seq_num) over (partition by account_num) sqe_nums

But this is where I'm stuck... using a comma delimited string. I can get it to work using INSTR but it can't use the primary key and the performance terrible.

instr(t1.seq_num || ',', t2.seq_nbrs || ',') > 0

And I tried joining to it this:

join (
  select
    t2.account_num,
    regexp_substr(t2.seq_nums,'[^,]+{1}',1,level) seq_num
  from
    dual
  connect by
    level <= length(regexp_replace(t2.seq_nums,'[^,]*')) + 1) t5
  on t1.account_num = t5. accout_num 
  and t1.sqe_num = t5.seq_num

But I should have known better (ORA-00904) - t2 will never be visible inside the select of the join.

Does anyone have any clever ideas?

Bob
  • 1,045
  • 8
  • 10
  • 1
    I set this up here http://sqlfiddle.com/#!4/e62e3/1 if you want to play. – Bob Jul 16 '14 at 17:56
  • You can use t1 to do the LISTAGG, right? I don't know if I'm missing something. I tried LISTAGG with t1 and was able to get the seq_nums - http://sqlfiddle.com/#!4/e62e3/5 – Joseph B Jul 16 '14 at 18:07
  • @Joseph No - The goal is to get from t4 into t1 using the primary key. So t4 needs to return all the seq_nums in a list or some kind of table stucture so it can be used to grab the exact transactions in t1 instead of using the dates as it is now. – Bob Jul 16 '14 at 18:14
  • The results that are in the question are the results that I need. I'm just trying to find a way to get them using the primary key at the top level "transactions" (account_num, seq_num) t1. – Bob Jul 16 '14 at 18:28

1 Answers1

2

I'd avoid the joins altogether by using subqueries and more analytic functions:

select
  account_num, seq_num, check_num, postdate
from
  (
    select account_num,
      seq_num,
      check_num,
      postdate,
      max(check_gap) over (partition by account_num) as max_check_gap
    from
      (
        select account_num,
          seq_num,
          check_num,
          postdate,     
          lead(check_num) over (partition by account_num order by check_num)
            - check_num as check_gap
        from
          transactions
        where postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))
    )
  )
where
  max_check_gap > 10
order by account_num, check_num;

SQL Fiddle with you original query, an intermediate attempt that misread the 10-check gap rule, and this version. All give the same result for this data.

This doesn't tackle the specific question you asked but hopefully addresses your underlying performance concern in a different way.


If you did want to stick with the joins - which hits the table multiple times so will be less efficient - you could use collect. This is a rough way, the table access could probably be improved:

select
  t1.*
from
  transactions t1
join (
  select
    t3.account_num,
    collect(t3.seq_num) as seq_nums,
    t3.min_postdate,
    t3.max_postdate,
    max(t3.check_diff)
  from (
    select distinct
      t4.account_num,
      t4.seq_num,
      lead(t4.check_num, 1, t4.check_num) over (partition by t4.account_num order by t4.check_num) - t4.check_num as check_diff,
      min(t4.postdate) over (partition by t4.account_num) min_postdate,
      max(t4.postdate) over (partition by t4.account_num) max_postdate
    from
      transactions t4
    where
      t4.postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))) t3
  group by
    t3.account_num,
    t3.min_postdate,
    t3.max_postdate
  having max(t3.check_diff) > 10) t2
    on t1.account_num = t2.account_num
    and t1.seq_num in (select * from table(t2.seq_nums))
;

SQL Fiddle.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Using the additional analytic function as you did is a much better approach as it only has to hit the table once. That's perfect! Not only that, you answered initial question too... COLLECT is what I wanted. I'll have to read up on that. Thanks so much! – Bob Jul 16 '14 at 21:21