Test Description
The test is conducted to find out how CBO uses the clustering feature, to run an ORDER BY query with the clustered key.
Assumptions and Pre-Conditions
According to snowflake docs:
https://docs.snowflake.com/en/user-guide/tables-clustering-keys.html
“ Queries benefit from clustering when the queries filter or sort on the clustering key for the table. Sorting is commonly done for ORDER BY operations, for GROUP BY operations, and for some joins. ”
Conforming to it, if an ORDER BY clause is performed on the clustering key, the optimiser uses it in some way to enhance the query performance.
Test Data
Data used for this testing is the orders table from the snowflake sample_data (shared )
"SAMPLE_DATA"."TPCH_SF1000"."ORDERS"
Steps to be Executed
- Created a new table with the properties of the ORDERS table in an owned database and copied all the values ;
CREATE TABLE ORDERS LIKE "SAMPLE_DATA"."TPCH_SF1000"."ORDERS"; INSERT INTO ORDERS SELECT * FROM "SAMPLE_DATA"."TPCH_SF1000"."ORDERS";
- Removed the clustering key set on the orders table :
ALTER TABLE ORDERS DROP CLUSTERING KEY;
- Created a new column ORDERS_NEW (NUMBER 10,0) after converting the O_ORDERDATE (DATE) into a number.
ALTER TABLE ORDERS ADD COLUMN ORDERS_NEW NUMBER(10,0); UPDATE ORDERS set ORDERS_NEW =TO_NUMBER(REPLACE(TO_VARCHAR(O_ORDERDATE),'-')) ;
- Clustered the table using the ORDERS_NEW column
ALTER TABLE ORDERS CLUSTER BY (ORDERS_NEW) ; ALTER TABLE ORDERS RESUME RECLUSTER ;
- Detected the logical execution plan using EXPLAIN on an query with ORDER BY clause on clustering key ( in this case : ORDERS_NEW)
EXPLAIN SELECT * FROM ORDERS ORDER BY ORDERS_NEW ;
Expected Result
Since after using the clustering key on the table it is supposed to be pre-sorted, the expectation would be that it does not need to be sorted on an execution.
Even if the sorting is in place with the execution plan it should be using the clustering feature, as defined in the documentation in some way to enhance the query performance.
Actual Result
The logical execution plan uses the SORT operation on the clustered table which makes the clustering feature redundant for the ORDER BY clause.
Conclusion and Discussions
In comparison with the expected result to the actual result, it fails to meet the functioning stated by the snowflake.
By this, I would extend our discussion to Why didn't Snowflake use the cluster key? And in what circumstances would it have done so?