1

I'm trying to build a PaaS like Ably where I provide users with a easy to use pub/sub system. The thing is that I'm planning to use Kafka but I don't know if it's the right fit for this. Each user can have any number of apps in the PaaS and each app will receive different messages and what I thought was that each app in the PaaS would have a topic in Kafka but the number of apps can grow to millions or even billions if I get a lot of users and Kafka isn't fit for this many topics.

Should I use Kafka for this or look into something else? Maybe there's some other way of separating messages between apps that I don't know of. I can't just put everything into a single topic because then I'd receive trillions of unnecessary messages on the nodes.

  • Unclear what events you're generating. For example one `App-Created` topic could be used for all apps and all users, assuming you have some consumer that constructs the backing infrastructure for a new app request. Similarly, `User-Signup` for storing user information. The problem you'd have is having information _returned back to specific users/apps_ that is not intended for them. For that use-case, you need specific topics (or even clusters, which can easily be done with Confluent Cloud or MSK) – OneCricketeer Jun 28 '21 at 14:29

2 Answers2

1

Disclaimer: I work at ably and lead some of our work around Kafka

First thing is that Ably is not built using Kafka, and Kafka is very much unsuited to the task of a service like Ably, in the same way that Ably does not do what Kafka does. Kafka is wonderfully powerful tool with a rich ecosystem but elastic scalability is very much not it's thing. Scaling a topic/partition is a slow process and adding nodes to a running active cluster is not something you can just "do". They do however, work great together

There are streaming solutions better suited to this like Apache Pulsar or Redis (PubSub/Streams), but once again its back to tradeoffs. Pulsar is better with push subscriptions, has functions and can do a lot more. Redis clusters can be scaled elastically and quickly. The tradeoffs being that Pulsar is VERY complex to run, manage and scale, and Redis is ephemeral by default. There are other solutions like NATS

There is a LOT of tech in Ably to allow the various clusters to scale to 10s of millions of connections and channels while maintaining strong guarantees , and none of it is available out of the box from a single open source vendor.

If Kafka is what you want to use Redpanda is likely where you should start. as you are trying to act on each message in a relatively simple fashion their in-line WASM could be very useful. Or you could use Ably ;)

Ben Gamble
  • 66
  • 4
  • 2
    I started with Redis Pub/Sub but you can't use a Cluster because it slows down as you add more nodes though they are working on a v2 Pub/Sub where this is addressed. Thank you for all of this information, I may end up using Ably though I'm mostly doing this for learning and Ably would be too easy – Gabriel Mendez Jun 29 '21 at 17:29
0

For your kafka question part :

Update March 2021: With Kafka's new KRaft mode (short for "Kafka Raft Metadata mode"; in Early Access as of Kafka v2.8), which entirely removes ZooKeeper from Kafka's architecture, a Kafka cluster can handle millions of topics/partitions. See https://www.confluent.io/blog/kafka-without-zookeeper-a-sneak-peek/ for details.

As the above feature is not yet architecture recommended for production usage current limit is thousands of topics/partitions in a kafka cluster which is backed by zookeeper

If you would want to provide some service to other applications and customer it is better to provide different topic so you could leverage authentication and authorization mechanism to avoid users to have access to other users data.

Ran Lupovich
  • 1,655
  • 1
  • 6
  • 13
  • That's awesome to hear however the idea is that in order for the PaaS I'm working on to work properly by using topic per application, the topics would have to be "unlimited" so that as I add brokers to the cluster, I can add more and more topics. If there's a set limit then there will be a moment when the PaaS will stop working because there's so many applications that no more topics can be created. – Gabriel Mendez Jun 27 '21 at 14:09
  • There is no hard limit setting, I did not say there is, its a matter of performance and best practices... sorry if my answer understood wrong – Ran Lupovich Jun 27 '21 at 14:20
  • I see, your answer was awesome I just got that idea that there's a limit based on what I've read about kafka topics on articles, it's not really a hard limit but more of a "when you reach X number of topics everything will stop working properly". I hope I'm wrong and kafka can still perform just fine when I have millions of topics (with KRaft) as I add more and more brokers. – Gabriel Mendez Jun 27 '21 at 14:38