At the moment I am developing a storm topology for processing raw machine measurement data. However, I am running into unexplainable problems with the spout.
I am running a simple storm topology on Azure HDInsight, written in Java. Events are read out of an eventhub, for which I am using the microsoft eventhub spout (version 0.9). This eventhub has 8 partitions, which means that I also need 8 instances of the Eventhubspout.
However, when I am running the topology for a couple of hours, the spouts stop receiving messages one after another, until each spout is quiet. No feedback is given whatsoever. The eventhub is still up and running when I inspect it through other means. Storm/the spouts are simply not registering anything anymore.
I have several ideas about what might be the problem here:
First, recently we have adjusted the messages that are being sent to the topology. Through batching (and parsing in the topology itself), we decreased the amount of messages significantly. The size of each message has grown enormously as well. This can cause two problems:
Each partition only gets a message every +/- 4 seconds, which is ridiculously low for storm. Could it be that it automatically times-out and crashes because of this?
Could the message 'sometimes' be too big, causing the spout to crash / show weird behavior?
Second, from time to time it can happen that the eventhub is briefly offline, because of some Azure error or unavailability of the network. That could mean that the eventhub is not sending messages for a while. While not receiving data, the spout shuts down but cannot wake up anymore?
For each of these mentioned reasons, shouldn't the eventhub spout automatically recover from this? What can be done to debug / tackle this problem?