24

I'm building an app that is constantly appending to a buffer while many readers consume from this buffer independently (write-once-read-many / WORM). At first I thought of using Apache Kafka, but as I prefer an as-a-service option I started investigating AWS Kinesis Streams + KCL and it seems I can accomplish this task with them.

Basically I need 2 features: ordering (the events must be read in the same order by all readers) and the ability to choose the offset in the buffer from where the reader starts consuming onwards.

Now I'm also evaluating Google Cloud Platform. As I am reading the documentation it seems that Google Pub/Sub is suggested as the equivalent to AWS Kinesis Stream, but at a more detailed level these products seem a lot different:

  • Kinesis guarantees ordering inside a shard, while on Pub/Sub ordering is on a best-effort basis;
  • Kinesis has all the buffer (limited to max 7 days) available to readers, which can use an offset to select the starting reading position, while on PubSub only the messages after the subscription are available for consuption.

If I got it right, PubSub cannot be considered a Kinesis equivalent. Perhaps if used together with Google Dataflow? I must confess that I still can't see how.

So, is PubSub an alternative to Kinesis? If not, is there a Google Cloud Product that would fulfill my requirements?

Thanks!

Renan
  • 1,705
  • 2
  • 15
  • 32
  • That is what I could see as well. PubSub+DataFlow (approx) not equivalent to Kinesis. While I have used Kinesis extensively, I don't see such documentation or functionality around pubsub and Dataflow. They might be bit far. – Kannaiyan Sep 11 '17 at 22:06
  • The post at https://cloud.google.com/blog/big-data/2016/09/apache-kafka-for-gcp-users-connectors-for-pubsub-dataflow-and-bigquery just made me a little more confused. It implies (subtly) that PubSub is an alternative to Kafka, but I still don't see the same capabilities. – Renan Sep 12 '17 at 00:40
  • 1
    With Pub/Sub you need to add the ordering information in the message payload. This may or may not be an issue with your application. – gdahlm Sep 12 '17 at 20:51

2 Answers2

10

A rather convoluted solution but it might help:

  • push your events using pub/sub to a single topic. At this point they will be unordered.
  • create a cloud dataflow streaming pipeline that reads from the pub/sub topic. Have it do streaming writes to cloud bigquery, add a timestamp to each table entry.
  • have you readers do queries on the bq table, order by timestamp to have a consistent order. You can use ROW_NUMBER as your offset.

Hope that helps.

HJED
  • 957
  • 1
  • 8
  • 16
  • Probably works, but as you said that's a lot of work. At this point I'd rather install Kafka on compute instances. But thank you for the suggestion. – Renan Sep 20 '17 at 20:20
  • 2
    @Renan if you are not up for implementing one of the [recommended approaches for ordering messages in Pub/Sub](https://cloud.google.com/pubsub/docs/subscriber#at-least-once-delivery), than your approach of hosting [Kafka on Compute Engine](https://pantheon.corp.google.com/launcher/details/bitnami-launchpad/kafka?project=javatester-1002&organizationId=433637338589) is indeed your best option. Note that the Pub/Sub engineers have been working hard on implementing message ordering, but there is no ETA for this feature currently. – Jordan Sep 21 '17 at 18:59
  • 2
    @Jordan I can try to implement one of the ordering approaches. But what I miss the most is the ability to start reading the buffer from a known, pior, offset (limited by max availability which is 7 days if I remember correctly). It's my understanding that in PubSub I can receive only messages posted after my subscription, I can't read prior messages. I may update my question to provide more background if you think that it would help elaborate a 100% Google Cloud solution. Thanks! – Renan Sep 21 '17 at 19:26
  • 2
    @Renan Seeking to a previous message is also almost ready to be released! You can see that the alpha test happened in this [old Google Groups form](https://groups.google.com/forum/#!searchin/cloud-pubsub-discuss/offset|sort:relevance/cloud-pubsub-discuss/1uLYENQKFQc/AyBRmtwDBAAJ). I have no ETA for these new features but they are indeed very close to being released into Pub/Sub production! – Jordan Sep 22 '17 at 17:35
  • @Jordan that's great news! Looking forward to it. Thanks for the update! – Renan Sep 23 '17 at 00:15
8

Pub/Sub now supports ordering natively. As for the requirement that a subscription (~consumer group in Kafka) exist before you consume, it's very rarely a problem for users. If nothing else, you can create snapshots which allow you to reset a new subscription to the state of any other existing subscription.

This is a bit late, but @Renan, if you are still watching would love to hear how you ended up building your system.

Kir Titievsky
  • 351
  • 2
  • 4
  • 2
    Thanks for sharing the info about native ordering, this is a nice feature! In the end I used AWS Kinesis initially and then migrated to Kafka (due to other reasons not related to this post). – Renan Jan 13 '21 at 04:33