0

I am trying to pass data from Cloud Pub/Sub to Google Cloud Storage. When I use runner DataflowRunner , the pipeline gets published to Google Cloud Dataflow and works as expected. However, for some testing I'd like the pipeline to run locally (but still read from Cloud Pub/Sub and write Cloud Storage). When I use the runner DirectRunner, the process writes out INFO:apache_beam.runners.direct.direct_runner:Running pipeline with DirectRunner., but does nothing when a new message is published to Pub/Sub.

I am executing the pipeline with this command:

python dev_radim_dataflow_gcs_direct.py ^
  --project=<GCP_PROJECT> ^
  --region="europe-west3" ^
  --input_subscription="projects/data-uat-280814/subscriptions/dev-radim-dataflow" ^
  --output_path=gs://dev_radim/dataflow_dest_local/ ^
  --runner=DirectRunner ^
  --window_size=1 ^
  --temp_location=gs://dev_radim/dataflow_temp_local/

The full dev_radim_dataflow_gcs_direct.py file is here: https://pastebin.com/W7VphH5A

Any ideas why the message doesn't make it from Pub/Sub to GCS?

RadRuss
  • 484
  • 2
  • 6
  • 16
  • 1
    I ran your code using the command you provided. The pubsub message I sent is in json format and message `INFO:apache_beam.runners.direct.direct_runner:Running pipeline with DirectRunner.` is also shown but I checked gcs bucket the output is written there and is just continously streaming. Are you expecting a different result? – Ricco D Feb 23 '21 at 08:11
  • 1
    That's exactly what I expect. In the light of your findings I went through the whole process and found out the issue was that there was another Dataflow Job connected to the same subscription. Once I created a new one for the job w DirectRunner, everything works as expected. Thanks a lot!! – RadRuss Feb 23 '21 at 08:43

1 Answers1

0

Posting comment by @RadRussian as an answer, since this could happen for other people as well:

There was another consumer reading from the same subscription, so no messages ever got to the pipeline running in the DirectRunner. In this case the consumer was a Dataflow job, but it could be anything.

Kenn Knowles
  • 5,838
  • 18
  • 22