8

I've been looking at Esper (and Storm) for stream processing.. Esper seems to do exactly what I want.. i.e. roling means, medians, complex queries, etc... but one thing has me wondering.

How would I scale out to multiple instances with Esper?

As far as I understand, Storm handles distributed processing, but with Esper you're on your own.

I wouldn't need to do it for the forseable future, but as we grow, so would our data volumes, would then need to scale out as well. Most likely we would be deployed in Amazon EC2.

Would I need to run multiple servers and shard data before sending them to my Esper application?

Is there a more graceful way of handling it?

-Sajal

sajal
  • 776
  • 1
  • 6
  • 23

2 Answers2

9

You can run an Esper instance within a bolt, meaning that Storm will handle tuple/event federation, and Esper will handle the CEP on events it receives in a given bolt.

This has some code and information about embedding Esper in a Storm bolt: http://tomdzk.wordpress.com/2011/09/28/storm-esper/

However... You need to have a use case that supports relatively stateless Esper engines handling a subset of data.

For example: you are computing average daily temperature by city. If don't distribute your tuples using shuffleGrouping based on the city field, then each Esper bolt could have a different set of data per city.

Basically, be sure to read up on how data is distributed in a Storm topology before committing to this architecture.

cmonkey
  • 4,256
  • 1
  • 26
  • 46
  • That's an interesting approach. However for our case, it looks like single instance of Esper would be fine for forseable future, after that we can either do storm + esper, or we can shard manually. Using your temperature example, we would be tracking 10 - 15 cities, each would be getting an even amount of data, and each city can be processed completely separately, and only summaries be compared later. – sajal Mar 20 '12 at 20:35
  • as i know most esper function is base context, in another word it's statued. how to make storm send the right tuple to the right esper bolt? – Jet Geng Sep 21 '12 at 03:46
0

From your question, it seems EsperHA is relevant? Have you looked at that?

EsperHA is a complete solution for zero-downtime ESP/CEP event processing. It combines Esper with local in-memory caching, resilient overflow to disk or database and clustered configuration with hot backup capabilities.

Antony Stubbs
  • 13,161
  • 5
  • 35
  • 39
  • It seems EsperHA only takes care of replication, not distribution. My question is specifically about distributing load to multiple instance. Im not asking about availability. – sajal Mar 31 '12 at 11:06
  • I am aware of EsperHA, but have not used it. It appears to be a paid-for product, and our shop was only exploring free options. – cmonkey Sep 23 '12 at 21:00