13

I am analyzing Azure SQL DW and I came across the term DWU (Data warehouse units). The link on Azure site only mentions a crude definition of DWU. I want to understand how DWU is calculated and how should I scale my system accordingly.

I have also referred to the link but it does not cover my question:

Community
  • 1
  • 1
ViSu
  • 462
  • 2
  • 4
  • 17

4 Answers4

11

In addition to the links you found it is helpful to know that Azure SQL DW stores data in 60 different parts called "distributions". If your DW is DWU100 then all 60 distributions are attached to one compute node. If you scale to DWU200 then 30 distributions are detached and reattached to a second compute node. If you scale all the way to DWU2000 then you have 20 compute nodes each with 3 distributions attached. So you see how DWU is a measure of the compute/query power of your DW. As you scale you have more compute operating on less data per compute node.

Update: For Gen2 there are still 60 distributions but the DWU math is a bit different. DWU500c is one full size node (playing both compute and control node roles) where all 60 distributions are mounted. Scales smaller than DWU500c are single nodes that are not full size (meaning fewer cores and less RAM than full size nodes on larger DWUs). DWU1000c is 2 compute nodes each with 30 distributions mounted and there is a separate control node. DWU1500c is 3 compute nodes and a separate control node. And the largest is DWU30000c which is 60 compute nodes each with one distribution mounted.

GregGalloway
  • 11,355
  • 3
  • 16
  • 47
  • 1
    Can 100 DWU be roughly related to six cores in the underlying virtualized infrastructure? – Steve Feb 07 '17 at 15:44
  • @Steve I have heard that before and the following DMV makes me believe you are right that the control node has 6 cores and each compute node has 6 cores. I'll let you know if I learn anything that contradicts that: select * from sys.dm_pdw_nodes_os_schedulers where status = 'VISIBLE ONLINE' – GregGalloway Feb 07 '17 at 18:55
  • The Gen2 core count per node is different (approximately 80 hyperthreaded cores on full size nodes). That DMV will show you the exact count on your DW. – GregGalloway Jun 05 '19 at 18:17
  • interesting Greg. were you able to find out how much memory per node as well? – fobius Sep 25 '21 at 00:16
  • 1
    @fobius it’s around 300GB per node: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits#service-levels – GregGalloway Sep 25 '21 at 00:33
  • i ran the DMV on a 500c and got 72 rows, on the 1000c I got 222 rows. Is the idea the number of rows corresponds to # schedulers, and thus # of vCPU? – fobius Sep 27 '21 at 18:44
  • That’s my understanding. And yes DW500c is one full size node playing both control and compute node roles. DW1000c is 2 compute nodes and one separate control node. – GregGalloway Sep 27 '21 at 18:46
  • For the 1000c then, 222/3 comes out to around 74 vCPU for each? Is that how you got to the around 80 cores per node (assuming as you go up, you get up to a max of around 80?) – fobius Sep 27 '21 at 18:49
  • @fobius I’ve seen different node counts over time. I don’t think they specify this implementation detail since the underlying hardware can change over time. – GregGalloway Sep 27 '21 at 18:51
2

I just found this link which shows the throughput to DWU relation

ViSu
  • 462
  • 2
  • 4
  • 17
1

You can also checkout the dwucalculator. This site walks you through the process of taking a capture for your existing workload and makes a recommendation on the number of DWUs necessary to fulfill the workload in Azure SQL DW.

http://dwucalculator.azurewebsites.net/

1

Depending on the amount of time and the number of tables, you may choose DWU.

For eg: If 100 DWU's are taking 15 mins of time for 3 tables and to implement the same in 3 mins you may choose 500 DWU.