I'm using Google Cloud Data Flow and when I execute this code :
public static void main(String[] args) {
String query = "SELECT * FROM [*****.*****]";
Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation().create());
PCollection<TableRow> lines = p.apply(BigQueryIO.read().fromQuery(query));
p.run();
}
I have this
(332b4f3b83bd3397): java.io.IOException: Query job beam_job_d1772eb4136d4982be55be20d173f63d_testradiateurmodegfcvsoasc07281159145481871-query failed, status: {
"errorResult" : {
"message" : "Cannot read and write in different locations: source: EU, destination: US",
"reason" : "invalid"
},
"errors" : [ {
"message" : "Cannot read and write in different locations: source: EU, destination: US",
"reason" : "invalid"
}],
"state" : "DONE"
}.
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.executeQuery(BigQueryQuerySource.java:173)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:120)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:87)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:261)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:209)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:184)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:161)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:47)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:341)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:297)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:125)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:105)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I already read theses posts 37298504, 42135002 and https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/405 but no solution work for me.
For more information :
- The BigQuery table is located in EU
I tried to starting the job with
--zone=europe-west1-b
andregion=europe-west1-b
I using the DataFlowRunner
When I go to the BigQuery Web UI, I see theses temporary datasets
EDIT : I solved my problem by using the version 1.9.0 of the dataflow SDK