Apache Spark or Cascading framework?

Question

I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?

Any help is appreciated.

Please re-open following edit. The question at its core is valid: do the two frameworks have different use cases? — Oskar Austegard, Aug 17 '15 at 00:46

score 14 · Accepted Answer · answered Aug 11 '14 at 10:22

14

At heart, Cascading is a higher-level API on top of execution engines like MapReduce. It is analogous to Apache Crunch in this sense. Cascading has a few other related projects, like a Scala version (Scalding), and PMML scoring (Pattern).

Apache Spark is similar in the sense that it exposes a high-level API for data pipelines, and one that is available in Java and Scala.

It's more of an execution engine itself, than a layer on top of one. It has a number of associated projects, like MLlib, Streaming, GraphX, for ML, stream processing, graph computations.

Overall I find Spark a lot more interesting today, but they're not exactly for the same thing.

answered Aug 11 '14 at 10:22

Sean Owen

66,182
23
141
173

2

Cascading aims to support Spark as an "execution fabric". See http://www.cascading.org/new-fabric-support/ for more details. – btiernay Oct 12 '14 at 19:54
4

Spark would more properly compared to MapReduce, which contrasts in-memory processing (Spark) vs. disk-based processing (MapReduce). Cascading currently is just an interface for writing MapReduce jobs. – Tom Jan 05 '15 at 20:47
Any learnings that you can share if you moved code from scalding/cascading to spark? – RamPrasadBismil Apr 26 '23 at 19:00

Apache Spark or Cascading framework?

1 Answers1