Background: We are using cloud data flow runner in Beam 2.0 to ETL our data to our warehouse in BigQuery. We would like to use the BigQuery Client Libraries (Beta) to create the schema of our data warehouse prior to the beam pipelines populating them with data. (Reasons: full control over table definitions,e.g. partitioning, ease of creating DW instances, i.e. datasets,separation of ETL logic from DW design, and code modularisation)
Problem: The BigQury IO in Beam uses TableFieldSchema and TableSchema Classes under com.google.api.services.bigquery.model for representing BigQuery fields and schemas, while the BigQuery Client Libraries uses TableDefinitionunder com.google.cloud.bigquerypackage for the same stuff, so the field and schema definitions can not be defined in one place and re-used at another place.
Is there a way to define the schema at one place and re-use it?
Thanks, Soby
p.s. we are using the Java SDK in Beam