Synapse Dedicated Pool Query Runs Slow

Question

I have got a Synapse Workspace set up with a Dedictaed Pool running at DW100c.

The issue is my data set is very small - I am creating some dimensions and facts, and the largest FACT is around 300,000 records. Dimensions are much smaller.

We were using Azure SQL Database and had a view to migrate into Synapse. Some of the stored procedures we were running in SQL Database (2 cores) was taking around 2 mins whereas in Synapse it takes 6-7minutes.

I cant understand if SYnapse Dedicated Pool is a more upscaled why do my queries take much longer? If i scale it to 500 yeah it does speed up, but surely that cant be the answer? I have created the tables using round robin / replicated / hash where they are necessary.

Is the issue, my dataset is too small for Synapse Dedicated Pools?

metaxiom · Answer 1 · 2023-07-13T15:16:45.287

Synapse dedicated sql pool should still be fast even with just 300K records in round robin distribution as long as you update statistics before you start querying because the first query will initiate a full table scan.

When you do use hash distribution remember pick a distribution key that can spread fairly evenly across 60 nodes...meaning don't pick a key that only has a few unique values and/or a large percentage of null values.

Make sure your dimension tables are all replicated distribution.

Understand that your database is segregated into 60 different database nodes so if you query anything using WHERE clause on a specific distribution key (e.g., ProductID = 90099) then only 1 of your 60 nodes will be doing any work. At DW100c that is not much processing power. Choose a different distribution key so you can take advantage of the parallel processing capabilities.

You can try setting up some clustered and non clustered indexes on your fact tables and make sure to update statistics after any updates to your fact tables.

I normally avoid HASH distribution unless I have a table with 80M to 100M rows or more because row groups are generated on each of the 60 nodes and for compression you need each row group to have 1M records.

Synapse Dedicated Pool Query Runs Slow

1 Answers1