1

I am working on a Java application which uses Bigquery as the analytics engine. Was able to run query jobs (and get results) using the code on Insert a Query Job. Had to modify the code to use service account using this comment on stackoverflow.

Now, need to run an extract job to export a table to a bucket on GoogleStorage. Based on Exporting a Table, was able to modify the Java code to insert extract jobs (code below). When run, the extract job's status changes from PENDING to RUNNING to DONE. The problem is that no file is actually uploaded to the specified bucket.

Info that might be helpful:

  • The createAuthorizedClient function returns a Bigquery instance and works for query jobs, so probably no issues with the service account, private key etc.
  • Also tried creating and running the insert job manually on google's api-explorer and the file is successfully created in the bucket. Using the same values for project, dataset, table and destination uri as in code so these should be correct.

Here is the code (pasting the complete file in case somebody else finds this useful):

import java.io.File;
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.Arrays;
import java.util.List;

import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.Bigquery.Jobs.Insert;
import com.google.api.services.bigquery.BigqueryScopes;
import com.google.api.services.bigquery.model.Job;
import com.google.api.services.bigquery.model.JobConfiguration;
import com.google.api.services.bigquery.model.JobConfigurationExtract;
import com.google.api.services.bigquery.model.JobReference;
import com.google.api.services.bigquery.model.TableReference;

public class BigQueryJavaGettingStarted {
    
    private static final String PROJECT_ID = "123456789012";
    private static final String DATASET_ID = "MY_DATASET_NAME";
    private static final String TABLE_TO_EXPORT = "MY_TABLE_NAME";
    private static final String SERVICE_ACCOUNT_ID = "123456789012-...@developer.gserviceaccount.com";
    private static final File PRIVATE_KEY_FILE = new File("/path/to/privatekey.p12");
    private static final String DESTINATION_URI = "gs://mybucket/file.csv";

    private static final List<String> SCOPES =  Arrays.asList(BigqueryScopes.BIGQUERY);
    private static final HttpTransport TRANSPORT = new NetHttpTransport();
    private static final JsonFactory JSON_FACTORY = new JacksonFactory();
    
    public static void main (String[] args) {
        try {
            executeExtractJob();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static final void executeExtractJob() throws IOException, InterruptedException, GeneralSecurityException {
        Bigquery bigquery = createAuthorizedClient();
        
        //Create a new Extract job
        Job job = new Job();
        JobConfiguration config = new JobConfiguration();
        JobConfigurationExtract extractConfig = new JobConfigurationExtract();
        TableReference sourceTable = new TableReference();

        sourceTable.setProjectId(PROJECT_ID).setDatasetId(DATASET_ID).setTableId(TABLE_TO_EXPORT);
        extractConfig.setSourceTable(sourceTable);
        extractConfig.setDestinationUri(DESTINATION_URI);
        config.setExtract(extractConfig);
        job.setConfiguration(config);

        //Insert/Execute the created extract job
        Insert insert = bigquery.jobs().insert(PROJECT_ID, job);
        insert.setProjectId(PROJECT_ID);
        JobReference jobId = insert.execute().getJobReference();
        
        //Now check to see if the job has successfuly completed (Optional for extract jobs?)
        long startTime = System.currentTimeMillis();
        long elapsedTime;
        while (true) {
            Job pollJob = bigquery.jobs().get(PROJECT_ID, jobId.getJobId()).execute();
            elapsedTime = System.currentTimeMillis() - startTime;
            System.out.format("Job status (%dms) %s: %s\n", elapsedTime, jobId.getJobId(), pollJob.getStatus().getState());
            if (pollJob.getStatus().getState().equals("DONE")) {
                break;
            }
            //Wait a second before rechecking job status
            Thread.sleep(1000);
        }
        
    }

    private static Bigquery createAuthorizedClient() throws GeneralSecurityException, IOException {
        GoogleCredential credential = new GoogleCredential.Builder()
            .setTransport(TRANSPORT)
            .setJsonFactory(JSON_FACTORY)
            .setServiceAccountScopes(SCOPES)
            .setServiceAccountId(SERVICE_ACCOUNT_ID)
            .setServiceAccountPrivateKeyFromP12File(PRIVATE_KEY_FILE)
            .build();
        
        return Bigquery.builder(TRANSPORT, JSON_FACTORY)
            .setApplicationName("My Reports")
            .setHttpRequestInitializer(credential)
            .build();
    }
}

Here is the output:

Job status (337ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: PENDING
...
Job status (9186ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: PENDING
Job status (10798ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: RUNNING
...
Job status (53952ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: RUNNING
Job status (55531ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: DONE

It is a small table (about 4MB) so the job taking about a minute seems ok. Have no idea why no file is created in the bucket OR how to go about debugging this. Any help would be appreciated.

As Craig pointed out, printed the status.errorResult() and status.errors() values.

  • getErrorResults(): {"message":"Backend error. Job aborted.","reason":"internalError"}
  • getErrors(): null
Community
  • 1
  • 1
mtariq
  • 400
  • 1
  • 10

4 Answers4

0

I believe the problem is with the bucket name you're using -- mybucket above is just an example, you need to replace that with a bucket you actually own in Google Storage. If you've never used GS before, the intro docs will help.

Your second question was how to debug this -- I'd recommend looking at the returned Job object once the status is set to DONE. Jobs that end in an error still make it to DONE state, the difference is that they have an error result attached, so job.getStatus().hasErrorResult() should be true. (I've never used the Java client libraries, so I'm guessing at that method name.) You can find more information in the jobs docs.

Craig Citro
  • 6,505
  • 1
  • 31
  • 28
  • Thanks for the quick response. 1- I am using the actual bucket name and not 'mybucket' so that is not the issue. 2- Thanks for pointing out the errorResult. Printed that out. Here is the output (for the poll job): getErrorResults(): {"message":"Backend error. Job aborted.","reason":"internalError"} getErrors(): null Any ideas? – mtariq Jul 30 '12 at 07:29
  • Aha -- rereading the question, I wonder if you have access to the bucket, but the service account you're using doesn't? Just as a check, can you set the bucket to world writeable and re-run the job, to at least narrow it down? – Craig Citro Jul 30 '12 at 22:11
  • Thanks Craig. Yes, it was a permissions issue. Since I can only accept one answer, am accepting Jordan's answer below. – mtariq Jul 31 '12 at 00:28
  • For updating the bucket acl, you need to do a `gsutil getacl gs://mybucket >acls.xml`, add an entry with `UserByEmail` scope, and then `gsutil setacl`. – Craig Citro Jul 31 '12 at 02:15
0

It looks like there was an access denied error writing to the path: gs://pixalate_test/from_java.csv. Can you make sure that the user that was performing the export job has write access to the bucket (and that the file doesn't already exist)? I've filed an internal bigquery bug on this issue ... we should give a better error in this situation. .

Jordan Tigani
  • 26,089
  • 4
  • 60
  • 63
  • 1
    Thanks Jordan. Changed bucket permissions to public and extract job works perfectly. Need to figure out how to add the service account user to the acl for the bucket. Will update the comment (for future searches) when I figure that out. (right now used: `gsutil setacl public-read-write gs://`) – mtariq Jul 31 '12 at 00:24
0

One more difference, I notice is you are not passing job type as config.setJobType(JOB_TYPE); where constant is private static final String JOB_TYPE = "extract"; also for json, need to set format as well.

0

I had the same problem. But it turned out was that I typed the name of the table wrong. However, Google did not generate an error message saying that "the table does not exists." That would have helped me locate my problem.

Thanks!

Jie
  • 1,107
  • 1
  • 14
  • 18