3

So the existing setup we had use to create a new table for each day, which worked fine with "WRITE_TRUNCATE" option, however when we updated our code to use partitioned table, though our dataflow job, it wouldn`t work with write_truncate.

It works perfectly fine, with write disposition set as "WRITE_APPEND" (From what i understood, from beam, it maybe tries to delete the table, and then recreate it), since i`m supplying the table decorator it fails to create a new table.

Sample snippet using python code:

beam.io.Write('Write({})'.format(date), beam.io.BigQuerySink(output_table_name + '$' + date, create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER, write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)

which gives the error:

Table IDs must be alphanumeric

since it tried to recreate the table, and we supply the partition decorator in the argument.

Here are some of the things that i`v tried:

  1. Updating the write_disposition as WRITE_APPEND, although it works, it fails the purpose, since running for the same date again would duplicate data.
  2. Using

bq --apilog /tmp/log.txt load --replace --source_format=NEWLINE_DELIMITED_JSON 'table.$20160101' sample_json.json

command, to see if i can observe any logs, on how does truncate actually works, based on the link that i found.

  1. Tried some other links, but this as well uses WRITE_APPEND.

Is there a way to write to a partitioned table, from a dataflow job using write_truncate method?

Let me know if any additional details are required. Thanks

Community
  • 1
  • 1
Sirius
  • 736
  • 2
  • 9
  • 22
  • The failure to create the table with the partition decorator may be a bug. Let me check and get back to you. – Pablo Feb 21 '17 at 21:21
  • Can you provide a stack trace for your 'Table IDs must be alphanumeric'? – Pablo Feb 21 '17 at 21:51
  • I checked with the IO dev. It seems that this is not supported now. : / – Pablo Feb 21 '17 at 22:02
  • Thanks for replying Pablo :), i was only hoping it does not delete the table for TRUNCATE, and just clears all the rows, for that partition, but i guess it doesn`t work that way [beam](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py#L939). – Sirius Feb 22 '17 at 06:12
  • @Sirius digging this back up (albeit being a very old question) as I've run in a very similar scenario. Did you end up submitting a Jira card to [this][https://issues.apache.org/jira/browse/BEAM-1743?filter=-4&jql=project%20%3D%20BEAM%20ORDER%20BY%20createdDate%20DESC] page or solving with some other approach that wasn't discussed here? – anddt Aug 30 '21 at 15:46

1 Answers1

1

Seems like this is not supported at this time. Credit goes to @Pablo for finding out from the IO dev.

According to the Beam documentation on the Github page, their JIRA page would be the appropriate to request such a feature. I'd recommend filing a feature request there and posting a link in a comment here so that others in the community can follow through and show their support.

Nicholas
  • 1,676
  • 12
  • 35