I am working on a Java application which uses Bigquery as the analytics engine. Was able to run query jobs (and get results) using the code on Insert a Query Job. Had to modify the code to use service account using this comment on stackoverflow.
Now, need to run an extract job to export a table to a bucket on GoogleStorage. Based on Exporting a Table, was able to modify the Java code to insert extract jobs (code below). When run, the extract job's status changes from PENDING to RUNNING to DONE. The problem is that no file is actually uploaded to the specified bucket.
Info that might be helpful:
- The
createAuthorizedClient
function returns a Bigquery instance and works for query jobs, so probably no issues with the service account, private key etc. - Also tried creating and running the insert job manually on google's api-explorer and the file is successfully created in the bucket. Using the same values for project, dataset, table and destination uri as in code so these should be correct.
Here is the code (pasting the complete file in case somebody else finds this useful):
import java.io.File;
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.Arrays;
import java.util.List;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.Bigquery.Jobs.Insert;
import com.google.api.services.bigquery.BigqueryScopes;
import com.google.api.services.bigquery.model.Job;
import com.google.api.services.bigquery.model.JobConfiguration;
import com.google.api.services.bigquery.model.JobConfigurationExtract;
import com.google.api.services.bigquery.model.JobReference;
import com.google.api.services.bigquery.model.TableReference;
public class BigQueryJavaGettingStarted {
private static final String PROJECT_ID = "123456789012";
private static final String DATASET_ID = "MY_DATASET_NAME";
private static final String TABLE_TO_EXPORT = "MY_TABLE_NAME";
private static final String SERVICE_ACCOUNT_ID = "123456789012-...@developer.gserviceaccount.com";
private static final File PRIVATE_KEY_FILE = new File("/path/to/privatekey.p12");
private static final String DESTINATION_URI = "gs://mybucket/file.csv";
private static final List<String> SCOPES = Arrays.asList(BigqueryScopes.BIGQUERY);
private static final HttpTransport TRANSPORT = new NetHttpTransport();
private static final JsonFactory JSON_FACTORY = new JacksonFactory();
public static void main (String[] args) {
try {
executeExtractJob();
} catch (Exception e) {
e.printStackTrace();
}
}
public static final void executeExtractJob() throws IOException, InterruptedException, GeneralSecurityException {
Bigquery bigquery = createAuthorizedClient();
//Create a new Extract job
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationExtract extractConfig = new JobConfigurationExtract();
TableReference sourceTable = new TableReference();
sourceTable.setProjectId(PROJECT_ID).setDatasetId(DATASET_ID).setTableId(TABLE_TO_EXPORT);
extractConfig.setSourceTable(sourceTable);
extractConfig.setDestinationUri(DESTINATION_URI);
config.setExtract(extractConfig);
job.setConfiguration(config);
//Insert/Execute the created extract job
Insert insert = bigquery.jobs().insert(PROJECT_ID, job);
insert.setProjectId(PROJECT_ID);
JobReference jobId = insert.execute().getJobReference();
//Now check to see if the job has successfuly completed (Optional for extract jobs?)
long startTime = System.currentTimeMillis();
long elapsedTime;
while (true) {
Job pollJob = bigquery.jobs().get(PROJECT_ID, jobId.getJobId()).execute();
elapsedTime = System.currentTimeMillis() - startTime;
System.out.format("Job status (%dms) %s: %s\n", elapsedTime, jobId.getJobId(), pollJob.getStatus().getState());
if (pollJob.getStatus().getState().equals("DONE")) {
break;
}
//Wait a second before rechecking job status
Thread.sleep(1000);
}
}
private static Bigquery createAuthorizedClient() throws GeneralSecurityException, IOException {
GoogleCredential credential = new GoogleCredential.Builder()
.setTransport(TRANSPORT)
.setJsonFactory(JSON_FACTORY)
.setServiceAccountScopes(SCOPES)
.setServiceAccountId(SERVICE_ACCOUNT_ID)
.setServiceAccountPrivateKeyFromP12File(PRIVATE_KEY_FILE)
.build();
return Bigquery.builder(TRANSPORT, JSON_FACTORY)
.setApplicationName("My Reports")
.setHttpRequestInitializer(credential)
.build();
}
}
Here is the output:
Job status (337ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: PENDING
...
Job status (9186ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: PENDING
Job status (10798ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: RUNNING
...
Job status (53952ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: RUNNING
Job status (55531ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: DONE
It is a small table (about 4MB) so the job taking about a minute seems ok. Have no idea why no file is created in the bucket OR how to go about debugging this. Any help would be appreciated.
As Craig pointed out, printed the status.errorResult() and status.errors() values.
- getErrorResults(): {"message":"Backend error. Job aborted.","reason":"internalError"}
- getErrors(): null