0

I have created a simple program to read from file and generate a file.Its working perfectly.I am worrying about how to make it real time topology.I want if i modify source file means added a new record it should come in my target file how i will do it without redeploying my topology on cluster.What else i need to configure to achieve this behavior.Below is code of submitting topology locally:-

Config conf= new Config();
        conf.setDebug(false);
        conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,1);
        TopologyBuilder builder = new TopologyBuilder();



            builder.setSpout("file-reader",new FileReaderSpout(args[0]));
            builder.setBolt("file-writer",new WriteToFileBolt(args[0])).shuffleGrouping("file-reader");
             LocalCluster cluster= new LocalCluster();
                cluster.submitTopology("File-To-File",conf,builder.createTopology());
                Thread.sleep(10000);
                cluster.shutdown();
user2435082
  • 295
  • 5
  • 16

2 Answers2

1

What you can probably do is use a message queue integrated with your storm cluster. Kafka could be a very nice candidate for this. It basically a publish-subscribed message system. There are producers responsible for adding messages to the queue and consumer on the other end to retrieve the same.

So if you integrate Kafka with storm as soon as your producer send/published a message to the queue it will be available to your storm topology. There is something called KafkaSpout which is a normal spout implementation capable of reading from a Kafka Queue.

So it goes like this your topology starts with a KafaSpout (subscribed to a particular topic) and emitting as soon as it receives anything and then chain the output to your corresponding bolts.

You can also look for Kestrel as an alternative to Kafka . You should select based on what exactly solve your purpose.

user2720864
  • 8,015
  • 5
  • 48
  • 60
  • thanx for your reply..if my source is a db table and target is file then how can i achieve real time processing.can i achieve without using any other 3rd party jar(i.e, Kafka) – user2435082 Oct 23 '13 at 09:51
  • In my understanding for doing anything real time you need to make sure a constant source of data (stream) to work on. That's where the concept of queue comes in. You can query a DB and retrieve set of information (result set/rows) and process them (like a batch) but what would you do if some one adds a new record into the db? you then need some kind of mechanism to detect that and make it available for processing. Could you please share what exactly you are trying to achieve – user2720864 Oct 23 '13 at 14:04
  • i exactly want the same thing you said if some one adds a new record into the db then what mechanism i need to detect that and make it availbale for processing.i just want to know for this detection what Storm provides(class name etc.) – user2435082 Oct 24 '13 at 07:34
  • one very ugly way could be by using some kind of pulling in say every certain interval to identify if anything new added ( you must have your own logic to determine what has been read till time though ). But in that case you need to keep pulling even when there is no new record added , this certainly does not look like a recommended approach to me – user2720864 Oct 24 '13 at 08:40
  • ya this is not right approach.do you have any idea of TridentTopology?whats the use of this topology? – user2435082 Oct 24 '13 at 09:21
  • trident is a high-level abstraction build on top of storm, capable of processing batches and maintain states. few places you can look at [tutorial](https://github.com/nathanmarz/storm/wiki/Trident-tutorial) , also take a look here http://stackoverflow.com/questions/15520993/storm-vs-trident-when-not-to-use-trident – user2720864 Oct 24 '13 at 09:33
0

Having read your comments in the other answer, you probably need to implement a queueing system before updating the rows in the DB.

I have personally used RabbitMQ with Storm and I know Kafka is also an option. Specifically, try add a queue such that one part of your topology (can be outside Storm too) reads off of the queue and updates the DB and the other part implements the processing logic you want.

Implementing a trigger to send events to the Storm topology is probably a bad idea, unless you have no other option.

-- Michael

mvogiatzis
  • 612
  • 7
  • 10
  • Thanks Michael.. yes i need to implement queue..can you please suggest what all things storm provides to implement queuing i dont want to use any other 3rd party library. – user2435082 Oct 28 '13 at 05:41
  • Storm doesn't provide any queuing mechanism as far as I know. – mvogiatzis Nov 02 '13 at 14:33