21

I'm relatively new to GCP and just starting to setup/evaluate my organizations architecture on GCP.

Scenario:
Data will flow into a pub/sub topic (high frequency, low amount of data). The goal is to move that data into Big Table. From my understanding you can do that either with a having a cloud function triggering on the topic or with Dataflow.

Now I have previous experience with cloud functions which I am satisfied with, so that would be my pick.

I fail to see the benefit of choosing one over the other. So my question is when to choose what of these products?

Thanks

Tsuni
  • 5,348
  • 2
  • 13
  • 20

2 Answers2

22

Both solutions could work. Dataflow will scale better if your pub/sub traffic grows to large amounts of data, but Cloud Functions should work fine for low amounts of data; I would look at this page (especially the rate-limit section) to ensure that you fit within Cloud Functions: https://cloud.google.com/functions/quotas

Another thing to consider is that Dataflow can guarantee exactly-once processing of your data, so that no duplicates end up in BigTable. Cloud Functions will not do this for you out of the box. If you go with a functions approach, then you will want to make sure that the Pub/Sub message consistently determines which BigTable cell is written to; that way, if the function gets retried several times the same data will simply overwrite the same BigTable cell.

Reuven Lax
  • 771
  • 5
  • 9
9

Your needs sound relatively straightforward and Dataflow may be overkill for what you're trying to do. If Cloud functions do what you need they maybe stick with that. Often I find that simplicity is key when it comes to maintainability.

However when you need to perform transformations like merging these events by user before storing them in BigTable, that's where Dataflow really shines:

https://beam.apache.org/documentation/programming-guide/#groupbykey

Alex
  • 5,141
  • 12
  • 26