SQL improvement

Question

I realized the below SQL using DISTINCT is a bad thing for performance.

However, if that is not included it will return too many rows (due to the one-to-many relationship between Instrument and Party tables).

I have searched the internet and see a few suggestions about getting rid of the DISTINCT, but I honestly have no idea what the best solution is.

Can you please provide recommendations for improving this SQL and ensuring it only returns one row per instrument (as opposed to multiple rows for each instrument)?

Just FYI - some things I saw as performance suggestions:

Get rid of IN statement that is easy and will be done
Replace DISTINCT with GROUP BY does this really improve performance?
Perhaps do an outer SELECT that includes DISTINCT around the big SQL without the DISTINCT. My thought behind this is that the DISTINCT would be performed on a much smaller subset than the 1M++ rows in INSTRUMENT table.

My query:

select * 
from
(
  SELECT DISTINCT
    I.UOID,
    I.INSTRUMENT_ID,
    I.STATUS,
    I.A_ACTIVITY_ORIG,
    I.A_CURRENCY,
    I.A_OPER_BK_ORG_ORIG,
    I.A_POSITION_ACTIVE,
    I.A_PRODUCT,
    I.A_PRODUCT_TYPE,
    I.A_TERMS_ACTIVE,
    I.CURR_COI,
    I.BOUT_COI,
    I.A_CUST_RELATIONSHP,
    I.DATE_START,
    I.DATE_END,
    I.CLIB_COI,
    I.CLIB_BASE,
    I.BLIB_COI,
    I.BLIB_BASE,
    I.BOUT_BASE,
    I.AVAL_COI,
    I.SEQUENCE_NUM,
    I.AVAL_BASE,
    I.MAXU_COI,
    I.MAXU_BASE,
    I.A_CURRENCY_BASE,I.A_CLIENT_BANK,
    I.A_PRODUCT_CATEGORY,
    I.A_ASSIGNMENT_ACTV,
    I.A_RELATED_ACTIVITY,
    C.CUSTOMER_ID,
    C.SHORT_NAME
  FROM instrument I 
  INNER JOIN PARTY P ON P.A_INSTRUMENT = I.UOID 
  INNER JOIN CUSTOMER C ON P.A_CUSTOMER = C.UOID AND C.CUSTOMER_ID = :customerId
  WHERE (I.STATUS <> 'TMP') 
    AND (I.A_CLIENT_BANK = :clientBank)  
    AND I.A_PRODUCT_CATEGORY <> 'CM'  
    AND I.STATUS NOT IN ( 'CAN','CLO','DEA','LIQ')
)
where rownum <= :maxSize;

You want `distinct` of all those columns ? It seems your joins aren't complete or there's a preliminary grouping that's ought to be done from the set of records from a table which you aren't doing before joining, which leads to duplicate rows. — Kaushik Nayak, Nov 08 '18 at 06:37
@Kaushik Nayak: What makes you say so? Why do you think the joins are incomplete? A customer is related to multiple instruments, but every instrument shall be shown just once. As there is the primary key in the select list, it should be easy for the optimizer to apply distinct on the set. — Thorsten Kettner, Nov 08 '18 at 07:08
The data model looks strange. Why is there both a customer_id and a uoid in the customer table? Why is there both an instrument_id and a uoid in the instruments table? That looks like there are two technical unique IDs per table. If that is the case you should remove one of them. — Thorsten Kettner, Nov 08 '18 at 07:12
One reason for poor performance could be the `<>` and `NOT IN` conditions. Unless you have a Bitmap-Index it is always expensive to search for "something which is not ..." because Oracle have to check entire data set. Is it possible to set condition like `I.STATUS IN (...)`? — Wernfried Domscheit, Nov 08 '18 at 08:14
`I.STATUS <> 'TMP'` and `I.STATUS NOT IN ( 'CAN','CLO','DEA','LIQ')` can be combined to `I.STATUS NOT IN ( 'CAN','CLO','DEA','LIQ', 'TMP')` — Wernfried Domscheit, Nov 08 '18 at 08:16
Post [execution plan](https://stackoverflow.com/questions/34975406/how-to-describe-performance-issue-in-relational-database?answertab=active#tab-top) of your query and provide some basic information about that statistics. How many rows in `instrument` per `clientBank`, dtto for `CUSTOMER`and `customerId`. How many rows and `distinct` row do you expect for both criteria? — Marmite Bomber, Nov 08 '18 at 10:15

score 0 · Answer 1 · answered Nov 08 '18 at 07:05

The customer you are looking for is related to an instrument by a party. As a customer is also related to multiple parties, they can be related to the same instrument multiple times. You could use an IN or EXISTS clause to get all instruments that occur in the customer's parties, but you also want to select from the customer table. So you want to join instead, and this is one of the few situations where using DISTINCT really makes sense.

As to your ideas:

Get rid of IN statement that is easy and will be done

I see no reason why you would do that. Are you saying that you want to replace the IN with multiple OR? That only gets the query less readable and does the same thing internally.

Replace DISTINCT with GROUP BY does this really improve performance?

Don't. It does the same thing, but lessens readability. GROUP BY is for aggregation. As you are not using any aggregation function, use DISTINCT. The execution plan should be exactly the same.

Perhaps do an outer SELECT that includes DISTINCT around the big SQL without the DISTINCT. My thought behind this is that the DISTINCT would be performed on a much smaller subset than the 1M++ rows in INSTRUMENT table.

No, the steps will be just the same: find the customer's parties, find these parties' instruments that match your criteria, show only some of them.

I see nothing to improve your query, except for the use of indexes. I don't know, whether it is better to get from the customer via parties to instruments or vice versa, so I'd create indexes for both cases:

create index idxc1 on customer( customer_id, uoid );
create index idxp1 on party ( a_customer, a_instrument );
create index idxi1 on instrument ( uoid, a_client_bank, status, a_product_category );

create index idxi2 on instrument ( a_client_bank, status, a_product_category, uoid );
create index idxp2 on party ( a_instrument, a_customer );
create index idxc2 on customer( uoid, customer_id );

Then run the query or retrieve an explain plan and see which indexes are actually used and drop the others.

SQL improvement

1 Answers1