How to create partitioned table from Google Bucket?

Question

Every weekend I add a few files to a google bucket and then run something from the command line to "update" a table with the new data.

By "update" I mean that I delete the table and then remake it by using all the files in the bucket, including the new files.

I do everything by using python to execute the following command in the Windows command line:

bq mk --table --project_id=hippo_fence-5412 mouse_data.partition_test gs://mybucket/mouse_data/* measurement_date:TIMESTAMP,symbol:STRING,height:FLOAT,weight:FLOAT,age:FLOAT,response_time:FLOAT

This table is getting massive (>200 GB) and it would be much cheaper for the lab to use partitioned tables.

I've tried a to partition the table in a few ways, including what is recommened by the official docs but I can't make it work.

The most recent command I tried was just inserting --time_partitioning_type=DAY like:

bq mk --table --project_id=hippo_fence-5412 --time_partitioning_type=DAY mouse_data.partition_test gs://mybucket/mouse_data/* measurement_date:TIMESTAMP,symbol:STRING,height:FLOAT,weight:FLOAT,age:FLOAT,response_time:FLOAT

but that didn't work, giving me the error:

FATAL Flags parsing error: Unknown command line flag 'time_partitioning_type'

How can I make this work?

You sure you can provide the data in the command line? "gs://mybucket/mouse_data/*"? The usage for the command is "bq mk table_name table_schema". — Hua Zhang, Mar 02 '18 at 18:49
So the command syntax should be: bq [--global_flags] [--command_flags] [args] e.g.: bq --project_id=... mk --table=... --time_partitioning_type=... — Michael Moursalimov, Mar 05 '18 at 19:22

score 0 · Answer 1 · answered Mar 16 '18 at 00:13

For the old data, a possible solution would be to create an empty partitioned table and then import each bucket file in the desired day partition. Unfortunately it didn’t work with wildcards when I tested it.

1. Create the partitioned table

bq mk --table --time_partitioning_type=DAY [MY_PROJECT]:[MY_DATASET].[PROD_TABLE] measurement_date:TIMESTAMP,symbol:STRING,height:FLOAT,weight:FLOAT,age:FLOAT,response_time:FLOAT

2. Import each file in the desired partition day. Here is an example for a file from 22nd February 2018.

bq load [MY_PROJECT]:[MY_DATASET].[PROD_TABLE]$20180222 gs://MY-BUCKET/my_file.csv

3. Process the current uploads normally and they will be automatically counted in the day of the upload partition.

bq load [MY_PROJECT]:[MY_DATASET].[PROD_TABLE] gs://MY-BUCKET/files*

Check also this thread https://stackoverflow.com/questions/38993877/migrating-from-non-partitioned-to-partitioned-tables — MonicaPC, Mar 16 '18 at 00:17
Make sure to have the latest version of Google Cloud SDK by running this command as administrator: `gcloud components update` link: https://cloud.google.com/sdk/docs/release-notes — MonicaPC, Mar 16 '18 at 16:19

How to create partitioned table from Google Bucket?

1 Answers1