11

I have worked on Storm and Spark but Samza is quite new.

I do not understand why Samza was introduced when Storm is already there for real time processing. Spark provides in memory near real time processing and has other very useful components as graphx and mllib.

What are improvements that Samza brings and what further improvements are possible?

zero323
  • 322,348
  • 103
  • 959
  • 935
Amit Kumar
  • 2,685
  • 2
  • 37
  • 72
  • There is also Kafka Streams: http://kafka.apache.org/documentation/streams and http://docs.confluent.io/current/streams/index.html – Matthias J. Sax Mar 29 '17 at 23:31
  • Btw: Samza entered Apache Incubator in 2013 already -- it not really new: http://incubator.apache.org/projects/samza.html – Matthias J. Sax Mar 29 '17 at 23:33
  • Interesting question, but if formulated like this, it is out of topic in the terms of StackOverflow: too broad and prone to subjective opinions. Try to post a more specific question which can be answered just with facts. – Honza Zidek Mar 30 '17 at 07:56
  • @HonzaZidek, If only I could make this limited to specific problem. I was suspecting this to be broad but do not see a better platform for asking such questions. – Amit Kumar Mar 30 '17 at 09:54

1 Answers1

16

This is a good summary of the differences and pros and cons.

I would just add that Samza, which actually isn't that new, brings a certain simplicity since it is opinionated on the use of Kafka as its backend, while others try to be more generic at the cost of simplicity. Samza is pioneered by the same people who created Kafka, who are also the same people behind the Kappa Architecture--primarily Jay Kreps formerly of LinkedIn. That's pretty cool.

Also, the programming models are totally different between realtime streams with Samza, microbatches in Spark Streaming (which isn't exactly the same as Spark), and spouts and bolts with tuples in Storm.

None of these are "better." It all depends on your use cases, the strengths of your team, how the APIs match up with your mental models, quality of support, etc.

You also forgot Apache Flink and Twitter's Heron, which they made because Storm started to fail them. Then again, very few need to operate at the scale of Twitter.

Vidya
  • 29,932
  • 7
  • 42
  • 70