4

I've a existing system where data is published to Pub/Sub topic, read by a cloud functions subscriber & pushed to Big Query to store it (No additional transformation done in subscriber CF).

Is it a good idea to change my subscriber CF to a Dataflow streaming job using pub/sub-BQ template? What are the pros/cons of using them?

Darshan Naik
  • 271
  • 1
  • 4
  • 15
  • This is a broad question, we'd need to know more about what kind of processing you're doing, at what frequency, volume, etc. Are you experiencing a specific problem with your current setup? – Travis Webb Nov 09 '19 at 03:31
  • I do not have any issues with CF solution as such, I thought Dataflow would be a good addition into our system. Our current system gets data around 1000 JSONs/sec & each json is of 2kB to our pubsub topic. – Darshan Naik Nov 09 '19 at 04:31
  • What method are you inserting data into BQ? Have you reviewed pricing for that method? Have you reviewed the insert quotas before posting this question? Edit your question with specific details on your environment and design. – John Hanley Nov 09 '19 at 06:22

1 Answers1

8

All depends of your use case and your data rate.

  • In case of sparse data published to PubSub topic, Cloud Function work well and cost almost nothing
  • In case of sustainable traffic, you have to take care of your processing time. A simple dataflow will cost only 1vm up (basic vm, n1-standard-1). Cloud Functions hour price is more expensive than 1vm up (n1-standard-1). In case of concurrent message, several instances will be spawn, and this increase the processing cost.

You also have to take into account the easiness of deployment of a function (at the opposite of Dataflow where you have to drain your pipeline, stop it and relaunch it) and the capability to do much more (and over a longer period of time) with Dataflow (you are limited in processing capability with function, and the processing duration of each message can't go above 9 minutes).

According with your project perspective, one solution or the other can be better.

In bonus, I have a third alternative: Cloud Run. Cloud Run is almost as easy as function do update and deploy, the processing duration is a little bit longer (15 minutes per message) and you can process several message on the same instance, and thus, the pricing can be far more interesting than with function because of this factorization. If you are interested, have a look on this article that I wrote

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76