I have a bigquery table that's clustered by several columns, let's call them client_id
and attribute_id
.
What I'd like is to submit one job or command that exports that table data to cloud storage, but saves each cluster (so each combination of client_id
and attribute_id
) to its own object. So the final uri's might be something like this:
gs://my_bucket/{client_id}/{attribute_id}/object.avro
I know I could pull this off by iterating all the possible combinations of client_id
and attribute_id
and using a client library to query the relevant data into a bigquery temp table, and then export that data to correctly named object, and I could do so asynchronously.
But.... I imagine all the clustered data is already stored in a format somewhat like what I'm describing, and I'd love to avoid the unnecessary cost and headache of writing the script to do it myself.
Is there a way to accomplish this already without requesting a new feature to be added?
Thanks!