5

I notice that the Cascalog getting started guide specifies a version of Hadoop

:profiles { :dev {:dependencies [[org.apache.hadoop/hadoop-core "1.0.3"]]}}

If my group uses a different version of Hadoop then am I out of luck? More broadly with what set of Hadoop versions does Cascalog interoperate?

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • 1
    The Cascading compatibility matrix is here: http://www.cascading.org/support/compatibility/. Other distributions might work but aren't officially supported. – Alex Sep 06 '13 at 17:53
  • 1
    @Alex that's a nice chart. Do Cascalog and Cascading's support match exactly? – MRocklin Sep 07 '13 at 04:23

1 Answers1

0

The simple answer is currently (as of Aug 10 2014) Cascalog is at version 2.1.1 and by default uses Cascading 2.5.3 and Hadoop 1.2.1, so yes, if your team is not using Hadoop version 1.x then you're out of luck.

However, Cascalog could be ported to Hadoop 2.x. Cascading 2.5.x has support for Hadoop 2, from the docs Hadoop 1 vs Hadoop 2:

Cascading 2.5 supports both Hadoop 1.x and 2.x by providing two Java dependencies, cascading-hadoop.jar and cascading-hadoop2-mr1.jar. These dependencies can be interchanged but the hadoop2-mr1.jar introduces new and deprecates older API calls where appropriate. It should be pointed out hadoop1-mr1.jar only supports MapReduce 1 API conventions. With this naming scheme new API conventions can be introduced without risk of naming collisions on dependencies.

The following is a naive guide for updating Cascalog to Hadoop 2.x:

  • Update the cascading-hadoop jar in the project file
  • Update hadoop version in HADOOP-VERSION config file
  • Find all uses of deprecated Cascading API and update to new conventions.
  • Compile and fix warnings/errors
  • recur

I'm no expert in the Cascalog source, but uses of Cascading API can be found with a few lines of grep and upgrading the API seems pretty straight forward, if a little tedious.

Daniel Canas
  • 946
  • 8
  • 15
  • it's been almost a year, does anyone know if anything has changed on this yet? It seems folks have to be running cascalog with the later versions of cascading at this point but i can't find any docs/article to suggest this. – joefromct Jul 27 '15 at 16:21
  • As far as I know, nothing has changed on this front. Looking at the 3.0.0 changelog https://github.com/nathanmarz/cascalog/blob/develop/CHANGELOG.md does not reveal any plans on bumping cascading or hadoop versions – Daniel Canas Jul 28 '15 at 14:25