1

At the moment I am developing a storm topology for processing raw machine measurement data. However, I am running into unexplainable problems with the spout.

I am running a simple storm topology on Azure HDInsight, written in Java. Events are read out of an eventhub, for which I am using the microsoft eventhub spout (version 0.9). This eventhub has 8 partitions, which means that I also need 8 instances of the Eventhubspout.

However, when I am running the topology for a couple of hours, the spouts stop receiving messages one after another, until each spout is quiet. No feedback is given whatsoever. The eventhub is still up and running when I inspect it through other means. Storm/the spouts are simply not registering anything anymore.

I have several ideas about what might be the problem here:

  • First, recently we have adjusted the messages that are being sent to the topology. Through batching (and parsing in the topology itself), we decreased the amount of messages significantly. The size of each message has grown enormously as well. This can cause two problems:

    1. Each partition only gets a message every +/- 4 seconds, which is ridiculously low for storm. Could it be that it automatically times-out and crashes because of this?

    2. Could the message 'sometimes' be too big, causing the spout to crash / show weird behavior?

  • Second, from time to time it can happen that the eventhub is briefly offline, because of some Azure error or unavailability of the network. That could mean that the eventhub is not sending messages for a while. While not receiving data, the spout shuts down but cannot wake up anymore?

For each of these mentioned reasons, shouldn't the eventhub spout automatically recover from this? What can be done to debug / tackle this problem?

cyrion1000
  • 11
  • 2

3 Answers3

0

I tried to search some code that be sure for automatically recover the eventhub spout from the exceptions, but there seems to be not any code to do this.

However, I think the issue might be caused by Storm bugs like https://issues.apache.org/jira/browse/STORM-329.

To debug the problem like this, you can try to refer to How to debug Apache Storm in Eclipse? to remote debugging on Eclipse with enable debugging in the worker JVM of HDInsight Storm.

Hope it helps. Best Regards.

Community
  • 1
  • 1
Peter Pan
  • 23,476
  • 4
  • 25
  • 43
  • Thank you for your reply! I will give remote debugging through eclipse a try. In addition, I updated the problem description. Any additional suggestions? – cyrion1000 Feb 15 '16 at 14:27
0

Is it possible that you are not acking the tuples? If you dont ack, the spout assumes the messages are still "pending" and wont try to take new ones from the event hub.

Do you see any errors from the spout?

Amir
  • 317
  • 2
  • 11
  • I downgraded my topology to one instance of one bolt (plus the necessary 8 instances of the eventhub spout), which only does a little bit of parsing. After it is finished, I ensured it acks the tuple. In addition, there is no error message from the spout whatsoever. I only see the occasional checkpointing of the spout, until it stops completely: 2016-02-13 05:03:46 c.m.e.s.SimplePartitionManager [INFO] saving state 545487898960, 2016-02-13 05:03:46 c.m.e.s.ZookeeperStateStore [INFO] data was saved. path: /eventhubspout/TimeOrderingTopology/test-ns/test/partitions/7, data: 545487898960 – cyrion1000 Feb 15 '16 at 14:34
  • Sorry, acking was my only idea. However, I can tell you that i sometimes get "timeout" errors from the spout, (If no message is received for X amount of time) but that the spout does recover and bring the messages when they do come, so that leads me to believe that there is this recovery process, its just not working here for some reason. – Amir Feb 16 '16 at 08:28
0

I have faced a similar issue and after checking all the storm eventhub code etc, realised that there are no waits on the storm side at least. So something has to be wrong on azure eventhub side itself. The following has helped increase the events I receive from eventhub per minute. The throughput units in azure was set to default 1. The spout received at 12/min. After making the throughput units to 5, it started receiving at 500/min.

This is a good article on same lines: https://blog.bennymichielsen.be/2015/08/11/scaling-an-azure-event-hub-throughput-units/

Let me know if this helped you as well.

Mrunal Pagnis
  • 801
  • 1
  • 9
  • 26