0

I have a pipeline that is loading a CSV file from GCS into BQ. The details are here: Import CSV file from GCS to BigQuery.

I'm splitting the CSV in a ParDo into a TableRow where some of the fields are empty.

String inputLine = c.element();

String[] split = inputLine.split(',');

TableRow output = new TableRow();
output.set("Event_Time", split[0]);
output.set("Name", split[1]);
...
c.output(output);

My question is, how can I have the empty fields show up as a null in BigQuery? Currently they are coming through as empty fields.

R. Gault
  • 72
  • 8

1 Answers1

2

It's turning up in BigQuery as an empty String because when you use split(), it will return an empty String for ,, and not null in the Array.

Two options:

  1. Check for empty String in your result array and don't set the field in output.
  2. Check for empty String in your result array and explicitly set null for the field in output.

Either way will result in null for BigQuery.

Note: be careful splitting Strings in Java like this this. split() will remove leading and trailing empties. Use split("," -1) instead. See here.

BTW: unless you're doing some complex/advanced transformations in Dataflow, you don't have to use a pipeline to load in your CSV files. You could just load it or read it directly from GCS.

Graham Polley
  • 14,393
  • 4
  • 44
  • 80
  • Thanks Graham, I am building towards a pipeline with transformations but starting very simple. Appreciate your answer and additional feedback. – R. Gault Jul 16 '17 at 16:17