8

I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?

Any help is appreciated.

Oskar Austegard
  • 4,599
  • 4
  • 36
  • 50
progrrammer
  • 4,475
  • 2
  • 30
  • 38

1 Answers1

14

At heart, Cascading is a higher-level API on top of execution engines like MapReduce. It is analogous to Apache Crunch in this sense. Cascading has a few other related projects, like a Scala version (Scalding), and PMML scoring (Pattern).

Apache Spark is similar in the sense that it exposes a high-level API for data pipelines, and one that is available in Java and Scala.

It's more of an execution engine itself, than a layer on top of one. It has a number of associated projects, like MLlib, Streaming, GraphX, for ML, stream processing, graph computations.

Overall I find Spark a lot more interesting today, but they're not exactly for the same thing.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • 2
    Cascading aims to support Spark as an "execution fabric". See http://www.cascading.org/new-fabric-support/ for more details. – btiernay Oct 12 '14 at 19:54
  • 4
    Spark would more properly compared to MapReduce, which contrasts in-memory processing (Spark) vs. disk-based processing (MapReduce). Cascading currently is just an interface for writing MapReduce jobs. – Tom Jan 05 '15 at 20:47
  • Any learnings that you can share if you moved code from scalding/cascading to spark? – RamPrasadBismil Apr 26 '23 at 19:00