1

Currently i have exported data as sharded to Google cloud, downloaded in server and streaming to the partitioned table, but the problem is that it takes long time. It streams like 1 Gb for 40 Minutes. Please help me to make it faster. My machine is 12 kernel and 20 Gb RAM CPU.

vidhya sagar
  • 87
  • 1
  • 10

1 Answers1

4

You can directly load data from Google Cloud Storage into your partition using a generated API call or other methods

To update data in a specific partition, append a partition decorator to the name of the partitioned table when loading data into the table. A partition decorator represents a specific date and takes the form:

$YYYYMMDD

For example, the following command replaces the data in the entire partition for the date January 1, 2016 (20160101) in a partitioned table named mydataset.table1 with content loaded from a Cloud Storage bucket:

bq load  --replace --source_format=NEWLINE_DELIMITED_JSON 'mydataset.table1$20160101' gs://[MY_BUCKET]/replacement_json.json

Note: Because partitions in a partitioned table share the table schema, replacing data in a partition will not replace the schema of the table. Instead, the schema of the new data must be compatible with the table schema. To update the schema of the table with the load job, use configuration.load.schemaUpdateOptions.

Read more https://cloud.google.com/bigquery/docs/creating-partitioned-tables

Pentium10
  • 204,586
  • 122
  • 423
  • 502