2

I'm trying to build the Apache Crunch source code on my CentOS 7 machine, but am getting the following error in the crunch-spark project when I execute mvn package:

[ERROR] /home/bwatson/programming/git/crunch/crunch-spark/src/it/scala/org/apache/crunch/scrunch/spark/PageRankClassTest.scala:71: error: bad symbolic reference. A signature in PTypeH.class refers to term protobuf
[ERROR] in package com.google which is not available.
[ERROR] It may be completely missing from the current classpath, or the version on
[ERROR] the classpath might be incompatible with the version used when compiling PTypeH.class.
[ERROR]       .map(line => { val urls = line.split("\\t"); (urls(0), urls(1)) })
[ERROR]           ^

Other SO questions about similar errors (here and here) seem to involve PATH or version issues. I've been messing around but can't seem to resolve them. For completeness:

[bwatson@ben-pc crunch]$ scala -version
Scala code runner version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

[bwatson@ben-pc crunch]$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

[bwatson@ben-pc crunch]$ mvn -version
Apache Maven 3.0.5 (Red Hat 3.0.5-16)
Maven home: /usr/share/maven
Java version: 1.8.0_31, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_31/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-123.20.1.el7.x86_64", arch: "amd64", family: "unix"

Any advice? I'm not really sure where Scala is looking for its dependencies, but I'd have thought that Maven would take care of it.

Community
  • 1
  • 1
Ben Watson
  • 5,357
  • 4
  • 42
  • 65

2 Answers2

0

Unfortunately Different versions of Scala are binary incompatible. Currently by default Apache Spark uses Scala 2.10.4, not Scala 2.11. Apache Scrunch is dependent on Spark. Maven does not know anything about this so it can't help. It is necessary to make some modifications to Scrunch to get it to compile for Scala 2.11 / JDK 1.8. I am working on this at the moment, but I don't have a solution yet. However I get the error message you report if I compile Scala 2.10.4 with JDK 1.8, not Scala 2.11, so I don't think it is doing quite what you intend. The error seems be coming from the Protobuf compiler or jar but I don't know why that is.

When I solve it myself, I will report back!

Mark Butler
  • 4,361
  • 2
  • 39
  • 39
  • I'm very sorry - I thought I had closed this question a while ago. I solved the issue - there was a Maven parameter missing from the official Crunch documentation. I've added and ticked my own answer now. I hope it helps you. – Ben Watson May 11 '15 at 16:42
  • OK, great, thanks Ben, with the switch my patch does work. However I think the version you are building, even though you have have Scala 2.11.5 installed, will be for Scala 2.10.4. As this is the default version that Spark targets that's probably good. But if you need a 2.11 version things are bit more involved. – Mark Butler May 11 '15 at 16:51
0

It turns out the official documentation for Crunch was missing a Maven parameter. The issue was solved by building using:

mvn package -Dcrunch.platform=2
Ben Watson
  • 5,357
  • 4
  • 42
  • 65
  • [That seems to be selecting the version of Hadoop to use](https://github.com/apache/crunch/blob/master/pom.xml#L47) – Mark Butler May 11 '15 at 16:46
  • Yes, I should clarify that this solution has fixed my particular issue, whereas you're looking to get to the root of the wider problem. – Ben Watson May 11 '15 at 17:34
  • For anyone else encountering this, here is the [relevant Crunch issue](https://issues.apache.org/jira/browse/CRUNCH-518) – Mark Butler May 11 '15 at 20:17