I am trying to execute a query over a table in BigQuery using its Java client libraries. I create a Job and then get the result of Job using job.getQueryResults().iterateAll() method.
This way is working but for large data like 600k it takes time around 80-120 seconds. I see BigQuery gets data in 40-45k batches which takes around 5-7 sec each.
I want to get the results faster and I found over internet that if we can get the temporary table created by BigQuery from the Job and the read the data in avro or some other format from that table if will be really fast, but in BigQuery API(using version: 1.124.7) I don't see that way. Does anyone know how to do that in Java, or how to get data faster in case of large number of records. Any help is appreciated.
Code to Read Table(Takes 20 sec)
Table table = bigQueryHelper.getBigQueryClient().getTable(TableId.of("project","dataset","table"));
String format = "CSV";
String gcsUrl = "gs://name/test.csv";
Job job = table.extract(format, gcsUrl);
// Wait for the job to complete
try {
Job completedJob = job.waitFor(RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob != null && completedJob.getStatus().getError() == null) {
log.info("job done");
// Job completed successfully
} else {
log.info("job has error");
// Handle error case
}
} catch (InterruptedException e) {
// Handle interrupted wait
}
Code to read same table using Query(Takes 90 Sec)
Job job = bigQueryHelper.getBigQueryClient().getJob(JobId.of(jobId));
for (FieldValueList row : job.getQueryResults().iterateAll()) {
System.out.println(row);
}