0

I'm trying to achieve max performance with my Storm setup. I'm sending tens of thousands of messages through Kafka, which would be received by the storm topology.

When I look in Storm UI, I noticed that all the messages are going to a single executor rather than load balanced between all the executors. (see attached screenshot).

Any reason for this and how can I load balance the Kafka messages?

Storm UI Screenshot

needmorehalp
  • 115
  • 2
  • 9
  • You need to provide more information about your Storm topology, and how you're reading from Kafka via Storm's Kafka spout. For example, how many partitions does your Kafka input topic have? And what is the parallelism setting of your Kafka spout in Storm? And, after the spout, what bolts have you configured, and what parallelism setting and stream groupings (like random shuffling, which would be helpful for load balancing) are you using for these bolts? – miguno Jul 11 '16 at 07:41
  • @miguno My storm topology is pretty simple right now, it takes the kafka spout and sends it to Hbase for persistence. The topic i created has 3 partitions with parallelism of 3. The topology has a total of 3 workers as well. The bolt for Hbase i'm using has a parallelism hint of 10 and is using shuffle grouping. – needmorehalp Jul 11 '16 at 18:23
  • Can you test it with parallelism 1. What does it say? – Aftab Jul 13 '16 at 14:05
  • The maximum parallelism you can have on a KafkaSpout is the number of partitions. possible duplicate of http://stackoverflow.com/questions/18267834/storm-kafka-multiple-spouts-how-to-share-the-load – Somnath Sarode Jul 13 '16 at 14:58

1 Answers1

1

Since you have 3 partitions, try creating the Kafka Spout with a parallelism hint of 3 and HBase Bolt with a parallelism hint of 3. Use Partial Key grouping in the HBase Bolt to load balance the messages between the bolts on the basis of a key.

Daniccan
  • 2,755
  • 1
  • 18
  • 27