I also suggest you look at the BigQuery api in gcloud-java. In gcloud-java you can use a TableDataWriteChannel to stream data to a BigQuery table.
See the following example (where JSON_CONTENT
is a string of JSON):
BigQuery bigquery = BigQueryOptions.defaultInstance().service();
TableId tableId = TableId.of("dataset", "table");
LoadConfiguration configuration = LoadConfiguration.builder(tableId)
.formatOptions(FormatOptions.json())
.build();
try (TableDataWriteChannel channel = bigquery.writer(configuration)) {
channel.write(
ByteBuffer.wrap(JSON_CONTENT.getBytes(StandardCharsets.UTF_8)));
} catch (IOException e) {
// handle exception
}
TableDataWriteChannel
uses resumable upload to stream data to the BigQuery table, which makes it more suitable for big data large files.
A TableDataWriteChannel
can also be used to stream local files:
int chunkSize = 8 * 256 * 1024;
BigQuery bigquery = BigQueryOptions.defaultInstance().service();
LoadConfiguration configuration = LoadConfiguration.builder(tableId)
.formatOptions(FormatOptions.json())
.build();
try (FileChannel fileChannel = FileChannel.open(Paths.get("file.json"))) {
WriteChannel writeChannel = bigquery.writer(configuration);
long position = 0;
long written = fileChannel.transferTo(position, chunkSize, writeChannel);
while (written > 0) {
position += written;
written = fileChannel.transferTo(position, chunkSize, writeChannel);
}
writeChannel.close();
}
For other examples on gcloud-java-bigquery you can have a look at BigQueryExample.