2

I'm comparing the performance of clustering with that of partitioning.

Comparing a partitioned table with a clustered table, the accessed data size of the clustered table is sometimes bigger than that of the partitioned table. (e.g., clustering 122.4MB vs partitioning 35.6MB)

I expect this is due to the limitation of the cluster's minimum data size.

Is there any way to know the limit? Or is there any other cause of the difference of accessed data size?

Edit I found the posts 1, 2 by ex-Google.

Post 2 said that "each cluster of data in BigQuery has a minimum size.", and Post 1 said that "If you have less than 100MB of data per day, clustering won't do much for you".

From these posts, I inferred that the cause of the large size of the clustered table is a minimum size of a cluster.

saket
  • 368
  • 1
  • 9

1 Answers1

0

Clusters are not like partitions. In fact there is no guarantee that there will be one cluster per column value (or if you use multiple columns for each combination of them). This is also why BigQuery cannot give you a good estimation of how much data the query will use before running it (like it does for partitions). Meanwhile, different partitions use different memory blocks.

Also, consider that BigQuery perform Auto-clustering (for free) therefore changing all the clusters. This is done so that the table will have more efficient clusters. This is required because when you insert/delete data the clusters results in very skewed clusters resulting in inefficient queries. This will results in data scanned by the same query even if data has not been inserted/deleted if in between BigQuery performed auto-clustering.

Another effect of this implementation is that a single table have a maximum number of partitions (4000). However, you do not have any restriction on the number of keys used for clustering.

So, single clusters in BigQuery may contains multiple clustering values and the underling clustered data blocks may change automatically due to auto-clustering.

Alessandro
  • 609
  • 1
  • 4
  • 8