9

When choosing to parallelize tasks I normally use Spark. Reading articles about parallel processing in Akka such as http://blog.knoldus.com/2011/09/19/power-of-parallel-processing-in-akka/ it seems using Akka to parallelize is at a lower level. It seems Spark abstracts some of the lower level concepts from the user, such as map reduce. Spark provides high level abstractions for grouping and filtering data. Is Akka a competitor to Spark for parallelising tasks or are they solving different problems ?

Before deciding which to use what considerations should I make ?

maasg
  • 37,100
  • 11
  • 88
  • 115
blue-sky
  • 51,962
  • 152
  • 427
  • 752

2 Answers2

10

Spark is actually built on top of akka (at least, at the time of this writing). :) (http://akka.io/community/ - Check out "projects using akka")

That said, the big value spark provides you is in those abstractions you mentioned, mostly (IMO) in the concept of an RDD and operations over RDDs. So if your problem domain fits into that nicely, go ahead with Spark; otherwise, custom write your own akka stuff.

waffle paradox
  • 2,755
  • 18
  • 19
  • As this "Spark is/was built on Akka myth" persists, including in [SO answers to a very similar question](https://stackoverflow.com/a/29089468), let me [link to another SO answer about Akka and Spark](https://stackoverflow.com/a/37448951/1847419): Spark was *never* "built on Akka", it only used it for its internal communication, and even that was abandoned, so that Akka clusters can more easily communicate with Spark a stream. – fnl Oct 26 '17 at 10:14
0

My take is if we have to processes to many small size of messages (millions) can write application based on akka. This should be faster than spark. Please comment.

If messages data is very big can not (need more than 1 jvm) need RDD. spark has some more extra feature which might be overhead in case of first case