4

Is it possible to use the equivalent of --autodetect in DataFlow?

i.e. can we load data into a BQ table without specifying a schema, equivalent to how we can load data from a CSV with --autodetect?

(potentially related question)

Community
  • 1
  • 1
Maximilian
  • 7,512
  • 3
  • 50
  • 63
  • Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the [How to Ask](http://stackoverflow.com/help/how-to-ask) page for help clarifying this question. – Mikhail Berlyant Apr 29 '17 at 00:51
  • Can we write to BQ without specifying a schema? – Maximilian Apr 29 '17 at 03:34

2 Answers2

8

If you are using protocol buffers as objects in your PCollections (which should be performing very well on the Dataflow back-end) you might be able to use a util I wrote in the past. It will parse the schema of the protobuffer into a BigQuery schema at runtime, based on inspection of the protobuffer descriptor.

I quickly uploaded it to GitHub, it's WIP, but you might be able to use it or be inspired to write something similar using Java Reflection (I might do it myself at some point).

You can use the util as follows:

TableSchema schema = ProtobufUtils.makeTableSchema(ProtobufClass.getDescriptor());
enhanced_events.apply(BigQueryIO.Write.to(tableToWrite).withSchema(schema)
            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));

where the create disposition will create the table with the schema specified and the ProtobufClass is the class generated using your Protobuf schema and the proto compiler.

Matthias Baetens
  • 1,432
  • 11
  • 18
0

I'm not sure about reading from BQ, but for writes I think that something like this will work on the latest java SDK.

.apply("WriteBigQuery", BigQueryIO.Write
    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
    .to(outputTableName));


Note: BigQuery Table must be of the form: <project_name>:<dataset_name>.<table_name>.
Alex Amato
  • 1,685
  • 10
  • 15
  • But that only works if the Table already exists - i.e. `BigQueryDisposition.CREATE_NEVER: fail the write if does not exist.` – Maximilian Apr 29 '17 at 04:29
  • Yeah, in this case we need a table which exists since we are trying to detect the schema of an existing table. If the goal/question is to define a scheme and create a table based on the java data types, then I don't think we support that. – Alex Amato May 23 '17 at 22:16
  • What Input does it expect? Objects? – Tobi Jun 20 '18 at 06:54