I'm trying to pass a BigQuery table name as a value provider for a apache beam pipeline template. According to their documentation and this StackOverflow answer, it's possible to pass a value provider to apache_beam.io.gcp.bigquery.ReadFromBigQuery
.
So this is the code for my pipeline
class UserOptions(PipelineOptions):
"""Define runtime argument"""
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument('--input', type=str)
parser.add_value_provider_argument('--output', type=str)
pipeline_options = PipelineOptions()
p = beam.Pipeline(options=pipeline_options)
user_options = pipeline_options.view_as(UserOptions)
(p | 'Read from BQ Table' >> beam.io.gcp.bigquery.ReadFromBigQuery(
user_options.input
)
When I run the code locally, the command line passes the value for user_options.input
is --input projectid.dataset_id.table
However, I had the error:
ValueError: A BigQuery table or a query must be specified
I tried:
Pass
projectid:dataset_id.table
use
bigquery.TableReference
-> not possibleUse
f'
{user_options.input}'
Pass a query -> works when run locally but does not work when I call the template on GCP. Error statement:
missing dataset while no default dataset is set in the request.", "errors": [ { "message": "Table name "RuntimeValueProvider(option: input, type: str, default_value: None)" missing dataset while no default dataset is set in the request.", "domain": "global", "reason": "invalid" } ], "status": "INVALID_ARGUMENT" } } >
What am I missing?