-1

I have two systems communicating with one another using send-acknowledge design where system 1 sends a record over to system 2 to be processed and once system 2 processes the message it will send an acknowledge message over to system 1 which changes the status from Pending -> Completed.

For example, at 1pm

Pending Records Table @ 1pm:

Record_Seq Type Status
1 A Pending
2 A Pending
3 B Pending
4 B Pending
5 C Pending
...
1000 Z Pending

Acknowledge Records Table @ 1pm:

Ack_seq Type Record_Seq
null null null

No acknowledge message received from system 2.


Acknowledge Records Table 1.05pm:

Ack_seq Type Record_Seq
1 A 1
2 B 3
3 B 4
4 A 2
5 C 5
... ...

Pending Records Table @ 1.05pm:

Record_Seq Type Status
1 A Completed
2 A Completed
3 B Completed
4 B Completed
5 C Completed
...
1000 Z Pending

The problem is that sometimes system 2 does not send over the acknowledged record so records could be left pending even though they were already processed.

I wrote the following to query to try to find the records that were missing their acknowledge

-- Since total number of pending records must equal the total number of acknowledge records,
-- The query counts and check for each type of pending records and joins it with the count of each type of ack record

SELECT * FROM (
    SELECT TYPE, COUNT(1) AS NUMBER_OF_PENDING FROM PENDING_RECORDS_TABLErecords
        WHERE TYPE IN (SELECT DISTINCT TYPE FROM PENDING_RECORDS_TABLE) 
        AND STATUS = 'PENDING'
        GROUP BY TYPE 
) AS RECORDS_SENT

LEFT JOIN -- SO THAT I CAN GET RECORDS THAT DID NOT RECEIVE ANY ACK AT ALL

(SELECT TYPE, COUNT(1) AS NUMBER_OF_RECEIVED FROM ACKNOWLEDGE_RECORDS_TABLE 
    WHERE TYPE IN (SELECT DISTINCT TYPE FROM PENDING_RECORDS_TABLE)
    GROUP BY TYPE
 ) AS ACK_RECEIVED
 
 ON (RECORDS_SENT.TYPE = ACK_RECEIVED.TYPE) 
 
 WHERE RECORD_SENT.NUMBER_OF_PENDING <> ACK_RECEIVED.NUMBER_OF_RECEIVED
 
 ORDER BY TYPE
  1. The issue with this query is that it runs very slowly and if the distinct number of types > 100, it becomes exceedingly slow. How do I optimise the query so that it can run considerably faster? (Note: the size of the records and ack table are quite large)

  2. I also encounter an issue where the query without the " WHERE RECORD_SENT.NUMBER_OF_PENDING <> ACK_RECEIVED.NUMBER_OF_RECEIVED" clause runs significantly faster than the query with this clause. I am not too sure what is causing the query to slow down because of this clause?

GMB
  • 216,147
  • 25
  • 84
  • 135
Benjamin
  • 79
  • 6
  • You `LEFT JOIN` to your derived table `ACK_RECEIVED` but `NUMBER_OF_RECEIVED` *must* have a non-`NULL` value, which isn't possible when no row is found. Is it intentional that you have an implicit `INNER JOIN`? If so, why not change it to an `INNER JOIN`? – Thom A May 02 '23 at 13:51
  • Also, the `DISTINCT` operator in your `IN` subqueries isn't needed. `1` is in both `1,2,3` and `1,1,1,1,2,2,2,2,2,3,3,3,3,3,3`. The `DISTINCT` may actually cause the query to be slower, as `DISTINCT` can be quite an expensive operator. The data engine *might* be "smart" enough to ignore the `DISTINCT` operator, but then again, it might not be. – Thom A May 02 '23 at 13:52
  • 1
    For performance question we need at a minimum: the tables and index definitions, and please share the query plan via https://brentozar.com/pastetheplan. The question is not answerable otherwise. – Charlieface May 02 '23 at 14:00
  • [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [mre] [Tips for asking a good SQL question](https://meta.stackoverflow.com/q/271055/3404097) [Asking query performance questions](https://dba.meta.stackexchange.com/q/3034/43932) [How to Optimize Queries in a Database - The Basics](https://stackoverflow.com/q/3191623/3404097) [What are your most common sql optimizations?](https://stackoverflow.com/q/1332778/3404097) [When and why are database joins expensive?](https://stackoverflow.com/q/173726/3404097) – philipxy May 02 '23 at 20:21

1 Answers1

2

... query to try to find the records that were missing their acknowledgement

I suspect that you are overcomplicating things here.

This reads like exists. This query gives you the list of records showing as pending in the first table while they have an acknowledgement in the second table:

select *
from pending_records_table p
where status = 'PENDING'
  and not exists (
    select 1 
    from acknowledge_records_table a 
    where a.type = p.type and a.record_seq = p.record_seq
  )

The following index would be beneficial for this query: acknowledge_records_table(type, record_seq).

If you wanted to aggregate this resultset by type:

select type, count(*) as cnt
from pending_records_table p
where status = 'PENDING'
  and not exists (
    select 1 
    from acknowledge_records_table a 
    where a.type = p.type and a.record_seq = p.record_seq
  )
group by type
order by type
GMB
  • 216,147
  • 25
  • 84
  • 135