39

I'm working with Storm and it is fine for a lot of use cases. Recently I had a look at Trident, which is a high-level abstraction of Storm. It supports exactly-once processing and makes stateful processing easier.

But now I'm wondering.. Why can't I always use Trident instead of Storm?

What I read so far:

  • Trident processes messages in batches, so throughput time could be longer.
  • Trident is not yet able to process loops in topologies.

Are there any other disadvantages when using Trident instead of Storm? Because right now, I think the disadvantages I listed above are marginal.

What use cases cannot be implemented with Trident?


Aftermath:

Since I asked the question my company decided to go for Trident first. We will only use pure Storm when there are performance problems. Sadly this wasn't an active decision it just became the default behavior (I wasn't around at that time).

Their assumption was that in most use cases we need state or only-once-processing or we will need it in near future. I understand their reasoning because moving from Storm to Trident or back isn't an easy transformation, but in my personal opinion the concept of stream processing without state wasn't understood by all and that was the main reason to use Trident.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
Christian Strempfer
  • 7,291
  • 6
  • 50
  • 75
  • 2
    I think you misunderstand: Trident is something that runs on top of Storm and replaces the old concept of transactional topologies. You can, of course, always use Trident instead of core Storm if you like. – Gordon Seidoh Worley Mar 20 '13 at 18:44
  • Hi Gordon, I know that Trident runs on top of Storm. Because of that I'm wondering why I should Storm anyway. It looks like the lower-level API of Storm is only needed for some uncommon use cases. – Christian Strempfer Mar 21 '13 at 08:05
  • 3
    As I've understood, when you have millions of events, batches processing time is not big (part of seconds i suppose), but database load is reduced. I think it's possible to implement some timeout and have additional event. And yes, Trident is high-level abstraction over the storm, and you could and should use storm api for something custom. – Alex Apr 05 '13 at 10:10

5 Answers5

46

To answer your question: when shouldn't you use Trident? Whenever you can afford not to.

Trident adds complexity to a Storm topology, lowers performance and generates state. Ask yourself the question: do you need the "exactly once" processing semantics of Trident or can you live with the "at least once" processing semantics of Storm. For exactly once, use Trident, otherwise don't.

I would also just like to highlight the fact that Storm guarantees that all messages will be processed. Some messages might just be processed more than once.

John Gilmore
  • 2,775
  • 1
  • 21
  • 18
20

If the lowest possible latency is your goal and you don't need exactly-once processing, then using Storm is better than Trident.

ChrisBlom
  • 1,262
  • 12
  • 17
4

Trident is a high-level abstraction for doing realtime computing on top of Twitter Storm, available in Storm 0.8.x. Storm is stateless stream processing framework and Trident provides stateful stream processing.

Do Do
  • 623
  • 5
  • 14
  • Hi, I updated my question to make it more clear: What use cases cannot be implemented with Trident? – Christian Strempfer Jul 09 '13 at 11:41
  • Hi, if you don't need stateful processing, then using Trident would be a waste of resources (CPU, RAM, ...), because it stores states in an external database in-memory. – Do Do Jul 15 '13 at 17:59
  • 1
    If you aren't tracking state then Trident won't "compile in" any extra overhead. In fact, the Trident "compiler" tends to generate more performant topologies than I could by hand. And it includes really optimized, performant helper functions you don't have to code yourself. – ChrisCantrell Jan 21 '14 at 16:35
1

Chris, since these two of them are open source technologies, trident serves as an only an implementation of a scenario on top of the storm, of course, this brought a performance overhead. If the trident could not meet your requirements, you create your own state implementation on top of the storm. Trident yielded higher level projects such as Trident-ML in time.

HakkiBuyukcengiz
  • 417
  • 4
  • 18
  • Hi, actually Trident meets all my requirements, because of that I asked why we still need plain Storm. – Christian Strempfer Mar 06 '14 at 15:03
  • 1
    Then we can answer to your question like; if trident does not meet your performance requirements, you can implement your own more efficient stateful framework on top of storm – HakkiBuyukcengiz Mar 07 '14 at 15:11
0

assume we want to do filtering + addition of a field to a tuple. if we use storm usually we use 2 bots for filtering , addition of field. so again we need to send the tuple to new bolt by may be using global grouping. so here nw bandwidth may become bottleneck.

by using trident we can use do above on a single machine. so no regrouping is needed in this case. such use case in addition to "exactly once" /"at east once" can differentiate what to use etc.

Trident is kind of grouping logical grouping

atul gupta
  • 19
  • 1
  • 7