How can I extract a single partition from a partitioned BigQuery table?

Question

According to the BigQuery docs, I should be able to export a single partition of a partitioned table:

Exporting all data from a partitioned table is the same process as exporting data from a non-partitioned table. For more information, see Exporting table data. To export data from an individual partition, append the partition decorator, $date, to the table name. For example: mytable$20160201.

However running the following extract command extracts the entire table, not just one partition. It is driving me nuts! What am I doing wrong?

bq --location=europe-west2 extract \
--destination_format NEWLINE_DELIMITED_JSON \
--compression GZIP \
bq-project-name:dataset.table_name$20200405 \
"gs://bucket-name/test_ga_sessions*.json.gz"

Adding partitioning information of source table here

I have also confirmed that the partition I am attempting to extract exists

#legacySQL
SELECT
  partition_id,
  creation_time,
  creation_timestamp,
  last_modified_time,
  last_modified_timestamp
FROM
  [dataset.tablename$__PARTITIONS_SUMMARY__]
  where partition_id = '20200405'

Have you tried quoting the table name? The $ operator might be messing with the table name in the terminal — Tlaquetzal, Jun 16 '20 at 15:57
Thanks @Tlaquetzal. I tried that too and it didn't work either — Graham, Jun 18 '20 at 16:15
On the top of my mind, another thing that might be messing with the perception of the partitions is: the table might be partitioned by ingestion time rather than a date column. I noticed that in the bucket-name you're using has the name "ga_sessions" for the end file. If you're using information from the GA export the tables you are looking to export might not even be partitioned. Can you update the question with the partition details of the table you're looking to export? — Tlaquetzal, Jun 18 '20 at 16:50
@Tlaquetzal the table I am exporting contains transformed GA data hence the name. It is not an original GA table. I have updated my question to include partitioning information for the table I am trying to export. — Graham, Jun 24 '20 at 10:33
Thanks for the update. It all looks fine, so it might be an issue with the service. Last things I will check, is, 1) what is the behavior when you try to export a partition that doesn't exist, for example the day 20100101. Does this export all the table as well ?; and 2) Consider that when using wildcard in the destination URI the count in the file name will start at 00000 not the exported date. Let me know if this information doesn't provide new insights — Tlaquetzal, Jun 24 '20 at 20:56
I tried exporting partition 20100101 which doesn't exist and still it exported the whole table. I also pulled one of the generated files as an example (test_ga_sessions000000000000.json.gz) and viewing it I could see data pertaining to 2019-06-01 — Graham, Jun 26 '20 at 09:12
I have also updated my post to show confirmation that the partition I am attempting to extract (20200405) does actually exist — Graham, Jun 26 '20 at 09:18
Hey Graham, I'd like to take a closer look at this. Can you open a private IssueTracker [here](https://issuetracker.google.com/issues/new?component=491370) and share the JobId from one (or more) of the exports with this behavior? After you opened it, you can share here again the reference number of it so I can take a look — Tlaquetzal, Jun 30 '20 at 13:22

score 2 · Answer 1 · answered Jul 04 '20 at 09:51

2

Because I was running the bq extract command in a bash shell, the partition decorator $20200405 was being interpreted as a variable and an empty one at that. Therefore the full partition identifier of bq-project-name:dataset.table_name$20200405 was being interpreted as bq-project-name:dataset.table_name by the time the request reached BigQuery.

In order to get this command to run correctly, all I had to do was escape the $ character of the partition decorator with a backslash as follows:

bq --location=europe-west2 extract \
--destination_format NEWLINE_DELIMITED_JSON \
--compression GZIP \
bq-project-name:dataset.table_name\$20200405 \
"gs://bucket-name/test_ga_sessions*.json.gz"

answered Jul 04 '20 at 09:51

Graham

71
7

I want to export all the partitioned table data, (i don't know the date from which the partition table was created) into the data storage. Is there a better way to do it? – Kabu Sep 15 '21 at 03:58
Can you explain your question a bit more? – Graham Sep 21 '21 at 11:31
I want to export all the partitioned table data into the data storage, what I mean by that is in above code you have mention about exporting the table for just one date i.e. bq-project-name:dataset.table_name\$20200405 . With my case I have to take a backup of all the partitioned data that we have (almost 90GB). I have to take a daily back up for all the partitioned table (all data). Thus was wondering how to do it? and also once we take the backup, how all the data can be restored from data storage (as if all the original data were deleted, and we have to restore all from data storage)? – Kabu Sep 23 '21 at 16:56
To backup an entire table you can simply omit the partition decorator so that would be `bq-project-name:dataset.table_name`. Alternatively, you could query the table metadata to find which partitions exist on the table and backup each one individually. As for restoring from data storage, I suggest referring to the BQ documentation on this. – Graham Oct 05 '21 at 20:24

How can I extract a single partition from a partitioned BigQuery table?

1 Answers1