0

I have a dimension table in my current warehouse (Netezza) which has 10 million records and which is being updated on a daily basis.

Should we keep this dimension table in BigQuery as it is as we are planning to migrate to BigQuery.

How can we redesign this large dimension in BigQuery?

SecretAgentMan
  • 2,856
  • 7
  • 21
  • 41
  • I suggest you to look the post [What's the difference between BigQuery and BigTable?](https://stackoverflow.com/questions/39919815/whats-the-difference-between-bigquery-and-bigtable), where is mentioned that BigQuery is a query Engine for datasets that don't change much. It's a great choice when your queries require a "table scan" or the need to look across the entire database. BigTable is a database. It is designed to be the foundation for a large, scalable application. Use BigTable when you are making any kind of app that needs to read and write data, and scale is a potential issue. – Enrique Zetina Nov 22 '19 at 22:41

1 Answers1

1

Because bigquery is not intended for updates, it's not that easy to implement a dimension table. The proper answer depends on your use case.

But here are some alternatives:

  1. Have an append-only dimension table with an "UpdatedAt" field. Than, use window function to get the last version (you can even create a view that has only the last version)
  2. Truncate the dimension table daily with the latest version of your data.
  3. Create an external table based on GCS / Big Table / Cloud SQL, and have the dimensions updated there.
  4. Save your dimension table in a separate database, and use Cloud Dataflow to perform the join
  5. Save the dimension data together with the fact table (Yes, there will be a lot of duplications, but sometimes it's worth the cost)
  6. Simply update the dimension table whenever there is a change (there is a limit to do that)

All of these approaches have drawbacks. The solution can even be a mix of more than one approach.

xcrazy360
  • 73
  • 4