0

I want to read a table in BigQuery using Python and Dataflow. I don't know the name of the table in advance. I'm using templates to pass the table name as follows:

.
.
.
from apache_beam.options.pipeline_options import PipelineOptions


class DataflowOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument(
            '--table_name',
            help='Name of table on BigQuery')


def run(argv=None):
    pipeline_options = PipelineOptions()
    dataflow_options = pipeline_options.view_as(DataflowOptions)

    with beam.Pipeline(options=pipeline_options) as pipeline:
        table_spec = bigquery.TableReference(
            projectId='MyProyectId',
            datasetId='MyDataset',
            tableId=str(dataflow_options.table_name))

        p = (pipeline | 'Read Table' >> beam.io.Read(beam.io.BigQuerySource(table_spec)))


if __name__ == '__main__':
    run()

But when I launch the job, I get the following error:

Workflow failed. Causes: S01:Read Table+Batch Users/ParDo(_GlobalWindowsBatchingDoFn)+Hash Users+Upload to Ads failed., BigQuery getting table "RuntimeValueProvider(option: table_name, type: str, default_value: None)" from dataset "MyDataset" in project "MyProject" failed., BigQuery execution failed., Error:
 Message: Invalid table ID "RuntimeValueProvider(option: table_name, type: str, default_value: None)".
 HTTP Code: 400

I read this answer, but isn't there something from 2017 so far?

1 Answers1

0

From the documentation as mentioned here, the TableReference takes the following parameters (dataset_ref, table_id). From your code snippet it looks like the braces are incorrectly placed.

with beam.Pipeline(options=pipeline_options) as pipeline:
        dataset_ref = bigquery.DatasetReference('my-project-id', 'some_dataset')
        table_spec = bigquery.TableReference(dataset_ref,
            tableId=str(dataflow_options.table_name)
aga
  • 3,790
  • 3
  • 11
  • 18
Jayadeep Jayaraman
  • 2,747
  • 3
  • 15
  • 26