The principle of the effect of snowflake warehouse size

Question

I'm experimenting with snowflake. I would like to know how it works as an architecture. I'm using three types of queries A, B, C with different numbers of bytes being scanned. The size of Bytes Scanned is reading Total Statistics in Profile Overview. I compared the Execution times in warehouses of size Small and size Large. When the scan size of query is small, the effect of warehouse size is small, The larger the scan size, the more 4 times the difference in warehouse size (Small 2, Large 8). Performance approaches.

I would like to know how the principle of this result.

Query	ByteScanned(MB)	Large(ms)	Small(ms)	S/L ratio
QueryA	1860	1350	2800	2.1
QueryB	6100	3800	12500	3.3
QueryC	51940	19310	77000	4.0

score 0 · Answer 1 · answered Aug 03 '21 at 06:33

0

To quote from the documentation:

“ The size of a warehouse can impact the amount of time required to execute queries submitted to the warehouse, particularly for larger, more complex queries. In general, query performance scales with warehouse size because larger warehouses have more compute resources available to process queries.”

Also:

“ Larger is not necessarily faster for small, basic queries”

Does this help? If not, please explain what other information you are looking for

answered Aug 03 '21 at 06:33

NickW

8,430
2
6
19

Thank you very much. But I'm hoping for an answer that some special process is producing this result. As this special process, query plan optimization, micro partition, data clustering, etc. are assumed. And I would like to know the specific processing that is influencing in assumed special process. – SNWsnow Aug 03 '21 at 08:50
The only people who might be able to explain what's going on, down to the level you appear to be looking for, are the Snowflake people - and they'd need access to the relevant query ids etc. in order to be able to investigate. So unless you have a specific support issue (and the appropriate support contract) I doubt this is going to happen – NickW Aug 03 '21 at 10:07
Thank you very much. As you say, I think snowflake will answer. Thank you for your advice. – SNWsnow Aug 04 '21 at 08:41

score 0 · Accepted Answer · answered Aug 03 '21 at 08:07

It's related to how Snowflake (or any Analytical MPP database) can process the data in parallel. As you know, a warehouse cluster may have multiple nodes, a small warehouse has 2, a large warehouse has 8 and so on... Each node has 8 instances to process the data in parallel. I will call them "workers" for clarity.

So a small warehouse has 28=16 workers, a large warehouse has 88=64 workers. If the data size is small (the table has fewer micro-partitions than the total workers), then your query will not use the whole warehouse resources. Some workers will stay idle during some execution steps such as reading from the table. This is why you get more performance from the large warehouse when you process larger data.

Thank you for your easy-to-understand answer. If all workers are assigned work equally and all workers work, I think a large warehouse has an advantage. However, I understand that due to MPP processing, some processes may not work in case of small queries. — SNWsnow, Aug 04 '21 at 08:41

The principle of the effect of snowflake warehouse size

2 Answers2