Optimize aggregating query

Question

I have a view that has suddenly gotten too slow and I'm at a loss of how to optimize it. The tables currently contain 15000 (@dispatchPallet) and 135000 (@pickLog) rows respectively.

I've written a minimized piece of code to show the important parts below.

DECLARE @dispatchPallet TABLE 
(
    [PICK_PALL_NUM] [bigint] NOT NULL,
    [PALLET_PLACEMENT] [nvarchar](4) NOT NULL,
    [SHIPMENT_ID] [nvarchar](255) NULL
)

DECLARE @pickLog TABLE 
(
    [LINE_NUM] [int] NOT NULL,
    [QTY_PRE] [numeric](9, 2) NULL,
    [QTY_SUF] [numeric](9, 2) NULL,
    [PICK_PALL_NUM] [bigint] NULL,
    [ROWID] [uniqueidentifier] NOT NULL,
    [WEIGHT_GROSS] [numeric](9, 3) NULL,
    [VOLUME] [numeric](9, 3) NULL
)


INSERT INTO @dispatchPallet ([PICK_PALL_NUM], [PALLET_PLACEMENT], [SHIPMENT_ID])
VALUES 
(4797753, 'B', 'SHIPMENT-1'),
(4797752, 'B', 'SHIPMENT-2'),
(4797750, 'B', 'SHIPMENT-3'),
(4797749, 'B', 'SHIPMENT-4'),
(4797739, 'B', 'SHIPMENT-5'),
(4797732, 'B', 'SHIPMENT-6'),
(4797731, 'B', 'SHIPMENT-7'),
(4797730, 'B', 'SHIPMENT-7'),
(4797723, 'B', 'SHIPMENT-8'),
(4797713, 'B', 'SHIPMENT-9')

INSERT INTO @pickLog ([LINE_NUM], [QTY_PRE], [QTY_SUF], [PICK_PALL_NUM], [ROWID], [WEIGHT_GROSS])
VALUES 
(30, 54, 54, 4797753, NEWID(), 1070.280),
(10, 24, 24, 4797752, NEWID(), 471.360),
(30, 12, 12, 4797750, NEWID(), 237.960),
(320, 25, 25, 4797749, NEWID(), 102.750),
(110, 3, 3, 4797739, NEWID(), 40.650),
(40, 12, 12, 4797732, NEWID(), 238.080),
(50, 4, 4, 4797732, NEWID(), 78.560),
(20, 20, 20, 4797731, NEWID(), 110.000),
(20, 40, 40, 4797730, NEWID(), 220.000),
(1340, 3, 3, 4797723, NEWID(), 14.250),
(410, 2, 2, 4797723, NEWID(), 4.780),
(440, 2, 2, 4797723, NEWID(), 21.000),
(480, 1, 1, 4797723, NEWID(), 3.500),
(1290, 2, 2, 4797723, NEWID(), 39.280),
(470, 1, 1, 4797723, NEWID(), 8.500),
(280, 3, 3, 4797723, NEWID(), 16.500),
(10, 2, 2, 4797723, NEWID(), 10.700),
(500, 2, 2, 4797723, NEWID(), 6.600),
(290, 1, 1, 4797713, NEWID(), 0.540),
(40, 2, 2, 4797713, NEWID(), 33.800)

SELECT 
    [dispatchPallet].[SHIPMENT_ID], 
    SUM([pickLog].[QTY_SUF]) AS KOLLI,
    COUNT(DISTINCT [pickLog].[LINE_NUM]) AS LINES,
    SUM([pickLog].[WEIGHT_GROSS]) AS PICKED_WEIGHT, 
    COUNT(DISTINCT [pickLog].[PICK_PALL_NUM]) AS PALLETS,
    COUNT(DISTINCT CASE WHEN [dispatchPallet].[PALLET_PLACEMENT] = 'B' THEN [dispatchPallet].[PICK_PALL_NUM] ELSE NULL END) AS BOTTOM_PALLETS       
FROM 
    @dispatchPallet dispatchPallet 
    INNER JOIN @pickLog pickLog ON [dispatchPallet].[PICK_PALL_NUM] = [pickLog].[PICK_PALL_NUM]
GROUP BY 
    [dispatchPallet].[SHIPMENT_ID]

-- Expected output:
-- SHIPMENT_ID  KOLLI   LINES   PICKED_WEIGHT   PALLETS BOTTOM_PALLETS
-- SHIPMENT-1   54.00   1       1070.280        1       1
-- SHIPMENT-2   24.00   1       471.360         1       1
-- SHIPMENT-3   12.00   1       237.960         1       1
-- SHIPMENT-4   25.00   1       102.750         1       1
-- SHIPMENT-5   3.00    1       40.650          1       1
-- SHIPMENT-6   16.00   2       316.640         1       1
-- SHIPMENT-7   60.00   1       330.000         2       2
-- SHIPMENT-8   18.00   9       125.110         1       1
-- SHIPMENT-9   3.00    2       34.340          1       1

could you show the explain plan ? Also, "*suddenly gotten too slow*" makes me think about a stats issues. Are all your table stats up-to-date ? — Thomas G, Apr 30 '18 at 11:38
Why do you need a transaction for a view? Do the real table have indexes? Post a query plan. Do you really need big int? — paparazzo, Apr 30 '18 at 12:59
@paparazzo There is no transaction in the view, this is just a habit for testing purposes. The main focus of the question is the SELECT - if there is a better way to do the DISTINCT / SUM / CASE / COUNT. — Danieboy, Apr 30 '18 at 13:33
@ThomasG How do I check that? For me "suddenly" is when the tables got slightly bigger one day (we only save 4 days of information). — Danieboy, Apr 30 '18 at 13:35
[Check table stats](https://basitaalishan.com/2013/04/15/determining-when-statistics-were-last-updated-in-sql-server/) (its mostly a thing for your DBA) and [Show Explain Plan](https://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan) — Thomas G, Apr 30 '18 at 13:39
Then take transaction out of the question, update the question with indexes, and post the query plan. — paparazzo, Apr 30 '18 at 13:46

score 2 · Answer 1 · edited Apr 30 '18 at 17:50

You should at least create primary constraint on as

ALTER TABLE @dispatchPallet TABLE   ADD PRIMARY KEY (PICK_PALL_NUM);

Foreign Key constraint as

ALTER TABLE @pickLog TABLE  ADD foreign key (PICK_PALL_NUM) references @dispatchPallet(PICK_PALL_NUM)

Also create a unique index on

CREATE UNIQUE NONCLUSTERED INDEX idx_PALLET_PLACEMENT_notnull
ON @dispatchPallet(PALLET_PLACEMENT)
WHERE PALLET_PLACEMENT IS NOT NULL;

score 1 · Accepted Answer · answered Apr 30 '18 at 13:45

1

Your query is simple and there isn't much room to optimize. You should check that you at least have indexes on dispatchPallet by SHIPMENT_ID and on pickLog by PICK_PALL_NUM. These would be the best choices for your query:

CREATE NONCLUSTERED INDEX NCI_dispatchPallet_shipment_ID 
    ON dispatchPallet (SHIPMENT_ID, PICK_PALL_NUM)
    INCLUDE (PALLET_PLACEMENT)


CREATE NONCLUSTERED INDEX NCI_pickLog_pick_pall_num 
    ON pickLog (PICK_PALL_NUM)
    INCLUDE (QTY_SUF, LINE_NUM, WEIGHT_GROSS)

You should also validate if you need your COUNT to be DISTINCT or not (distinct is an expensive operation).

Last but not least, you should really check how you access the view; if you are filtering it, joining it, etc. These other conditions might generate different query plans and make your performance go down if not managed correctly (even with the right indexes!).

answered Apr 30 '18 at 13:45

EzLo

13,780
10
33
38

I've looked at the indexes and they made it slightly faster. What do you mean by "how you access the view" exactly? – Danieboy May 02 '18 at 06:19
The view itself is just a syntactic shortcut, it doesn't guarantee that by using it the SQL engine will use the correct indexes. By "accessing the view" I mean which is the SQL you use to actually query the view. `SELECT * FROM YourView` is completely different than `SELECT * FROM YourView INNER JOIN OtherTable ON SomeColumn = OtherColumn WHERE PICK_PALL_NUM = 9142`. The performance overall will be highly dependant on filters and joins with other tables. – EzLo May 02 '18 at 06:57
The view is selected pretty much like this: "SELECT * FROM VIEW ORDER BY PICK_PALL_NUM DESC". – Danieboy May 02 '18 at 07:45
Well then there isn't much more you can do with just tunning this SQL. Check the query execution plan and make sure that the indexes we mentioned are being used and their fragmentation is rather low (rebuild if it's big). Other than that, you might have performance dropdowns by network problems, server overloads or other operations blocking the tables at the moment you query. – EzLo May 02 '18 at 07:59

score 0 · Answer 3 · answered Apr 30 '18 at 13:30

0

For starters there should be primary keys and foreign keys on these tables so that this query can do index seeks/scans (paparazzo's comment above) as opposed to full table seeks/scans.

In addition to the bigint/int, what's the purpose of the uniqueidentifier?

answered Apr 30 '18 at 13:30

Jim Horn

879
6
14

In this specific query on the tables the uniqueidentifier isn't really required. But it's for other queries on the table. Also bigint is needed, don't look at the specific data in the example too much. The uniqueidentifier is the primary key of the table. I did not create this myself.. There is an index on PICK_PALL_NUM and SHIPMENT_ID so that shouldn't be an issue. – Danieboy Apr 30 '18 at 13:37

Optimize aggregating query

3 Answers3