6

From the BigQuery page on partitioned tables:

Each table can have up to 2,000 partitions.

We planned to partition our data by day. Most of our queries will be date based but we have about 5 years historical data and plan to collect more each day from now. With only 2000 partitions: 2000/365 gives us about 5.5 years worth of data.

What is the best practice for tables wanting more than 2000 partitions?

  • Create a different table per year and join tables when required?
  • Is it possible to partition by week or month instead?
  • Can that 2000 partition limit be increased if you ask support?

Update: Table limit is now 4000 partitions.

andy boot
  • 11,355
  • 3
  • 53
  • 66
  • It seems it was recently updated to 2500. The docs now say: " Each partitioned table can have up to 2,500 partitions." – andy boot Jul 20 '17 at 13:53
  • The partition limit is now 4000. – skeller88 Nov 18 '18 at 01:36
  • See the following question to see how to get around this limit by partitioning by week/month/year: https://stackoverflow.com/a/56125049/132438 – Felipe Hoffa May 14 '19 at 07:28
  • 1
    Rather than asking a duplicate of this question and then answering it yourself, why not add your answer to this question? @FelipeHoffa – andy boot May 17 '19 at 11:33
  • Because the question here was "how to get past the 2000 limit", and there's no 2000 limit any more since 2017. Hence I would consider this question deprecated. – Felipe Hoffa May 17 '19 at 12:45
  • @andyboot funny - someone downvoted the new question/answer after you wrote this comment. Hopefully you'll find the answer useful regardless of where it lives! – Felipe Hoffa May 17 '19 at 16:33
  • 1
    I honestly don't know what is meant to happen here @FelipeHoffa. I do appreciate that you have written a more up to date and more relevant answer to this problem. But it also doesn't seem right to duplicate this original question and then mark the original as a duplicate. Perhaps older questions like this are meant to be culled. I don't know. I'll leave that to the SO Mods. – andy boot May 18 '19 at 16:14
  • @andyboot I'm just trying to be as helpful as possible. I was wondering: What's the best way to help people with a similar (but not quite the same) question. Should I bring them to this 2017 question with 4 different answers, or can I somehow give them a less confusing alternative? I might have made the "wrong" choice here - but I'm only trying to help. – Felipe Hoffa May 22 '19 at 03:54
  • 1
    I apprecate you are trying to be helpful @FelipeHoffa. But I don't know what to do. I have raised this discussion on the SO chat and asked a mod for help. – andy boot May 23 '19 at 13:24
  • Why do you feel wronged? What's the worst that could happen? Who are you trying to help? – Felipe Hoffa May 23 '19 at 13:57
  • I don't feel wronged. I'd just like a mod to merge the questions or close one down. Perhaps I should have been clearer - I've asked a mod to merge the questions. – andy boot May 23 '19 at 15:12
  • Note for the mods reading this: This question contains multiple questions (at least 3), hence it begs to get multiple answers. The newer one asks one straight question with one straight answer. I'll say that's better for users. Merging would only create further confusion - for no benefit (see previous Andy reply). https://meta.stackexchange.com/questions/39223/one-post-with-multiple-questions-or-multiple-posts – Felipe Hoffa May 23 '19 at 15:33

3 Answers3

5

We are in the process of enforcing the 2000 partitions limitations (so we documented that a bit early to give our users early notice). So it is a soft limit at this point.

Creating large numbers of partitions has performance implications, so we suggest limiting tables to 2000 partitions. We have some room here depending on the schema of the table, so it would be reasonable to ask support if an increase is possible. We will review it depending on how much resources we think would be needed for operations against that table.

We hope to support a larger number of partitions (upto 10K) at some point in the future, but we are working through the design and implementation changes necessary for that (we don't have an ETA on it at this point).

Pavan Edara
  • 2,285
  • 1
  • 12
  • 12
  • Any update on this? I see that the number of partitions can now be 2500 max but for daily partitioned tables this means that for storing 10 years of data splitting is still needed – Tobi Apr 16 '18 at 06:51
  • This doesn't actually answer the question - _What is the best practice for tables wanting more than 2000 partitions?_ DAY is still the only partition type. Could the new [clusters](https://cloud.google.com/bigquery/docs/clustered-tables) feature help if a partitioned table needs to hold more than 2000 days of data? – intotecho Jan 08 '19 at 23:12
  • The current limit is 4000 partitions. You can indeed cluster your partitioned table if you need more than that. – Pavan Edara Jan 10 '19 at 00:24
  • I am hitting the limit of `4000` already. Will there be support for > 4000 soon? Or could you suggest what BigQuery users with day partitions more than that should do? – cryanbhu Jun 01 '20 at 06:30
  • @PavanEdara same here I hit 4000 already, what should I do if I need my partitions to be by day? – Mee Mar 09 '22 at 10:56
2

Regarding your question "Is it possible to partition by week or month instead?", there is a feature request to get more flexibility on the type of partitions: https://issuetracker.google.com/issues/35905817

If we are able to also have INT as a type of partition, then it would be easy to define 'months partitions' in the way YYYYMM.

Sourygna
  • 679
  • 4
  • 14
2

The limit is now 4,000 partitions which is just over 10 years of data. However if you have more than 10 years of data and would like it partitioned by day one workaround we have used is splitting your table into decades and then writing a view on top to union the decade tables together.

When querying the view with the date partitioned field in the where clause BigQuery knows to only process the required partitions even if this is across multiple or within a single table.

We have used this approach to ensure business users (data analysts and report developers) only need to worry about a single table but still access the performance and cost benefits of partitioned tables.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Matt Laz
  • 230
  • 1
  • 2
  • 8