0

I'm using Google Cloud Data Flow and when I execute this code :

public static void main(String[] args) {

    String query = "SELECT * FROM [*****.*****]";

    Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation().create());

    PCollection<TableRow> lines = p.apply(BigQueryIO.read().fromQuery(query));

    p.run();
}

I have this

(332b4f3b83bd3397): java.io.IOException: Query job beam_job_d1772eb4136d4982be55be20d173f63d_testradiateurmodegfcvsoasc07281159145481871-query failed, status: {
    "errorResult" : {
        "message" : "Cannot read and write in different locations: source: EU, destination: US",
        "reason" : "invalid"
    },
    "errors" : [ {
        "message" : "Cannot read and write in different locations: source: EU, destination: US",
        "reason" : "invalid"
    }],
    "state" : "DONE"
}.
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.executeQuery(BigQueryQuerySource.java:173)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:120)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:87)
    at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:261)
    at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:209)
    at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:184)
    at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:161)
    at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:47)
    at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:341)
    at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:297)
    at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
    at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:125)
    at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:105)
    at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:92)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I already read theses posts 37298504, 42135002 and https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/405 but no solution work for me.

For more information :

  • The BigQuery table is located in EU
  • I tried to starting the job with --zone=europe-west1-b and region=europe-west1-b

  • I using the DataFlowRunner

When I go to the BigQuery Web UI, I see theses temporary datasets

EDIT : I solved my problem by using the version 1.9.0 of the dataflow SDK

PH. Alain
  • 1
  • 2
  • Just to double check, which version of the dataflow SDK are you running? 1.7.0+? Which fixes https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/405 Also, is it possible to inspect the temporary datasets, are they in the US region? – Alex Amato Aug 01 '17 at 20:10
  • @AlexAmato I solved my problem by using the version 1.9.0 of the dataflow SDK. I think the issue described in this post : https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/405 persist in the version 2.0 of the dataflow SDK – PH. Alain Aug 04 '17 at 11:21
  • Glad that its working now :) – Alex Amato Aug 04 '17 at 18:38

0 Answers0