So the existing setup we had use to create a new table for each day, which worked fine with "WRITE_TRUNCATE" option, however when we updated our code to use partitioned table, though our dataflow job, it wouldn`t work with write_truncate.
It works perfectly fine, with write disposition set as "WRITE_APPEND" (From what i understood, from beam, it maybe tries to delete the table, and then recreate it), since i`m supplying the table decorator it fails to create a new table.
Sample snippet using python code:
beam.io.Write('Write({})'.format(date), beam.io.BigQuerySink(output_table_name + '$' + date, create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER, write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
which gives the error:
Table IDs must be alphanumeric
since it tried to recreate the table, and we supply the partition decorator in the argument.
Here are some of the things that i`v tried:
- Updating the write_disposition as WRITE_APPEND, although it works, it fails the purpose, since running for the same date again would duplicate data.
- Using
bq --apilog /tmp/log.txt load --replace --source_format=NEWLINE_DELIMITED_JSON 'table.$20160101' sample_json.json
command, to see if i can observe any logs, on how does truncate actually works, based on the link that i found.
- Tried some other links, but this as well uses WRITE_APPEND.
Is there a way to write to a partitioned table, from a dataflow job using write_truncate method?
Let me know if any additional details are required. Thanks