2

I'm running hive 2.1.1, hadoop 2.7.3 on Ubuntu 16.04.

According to Hive on Spark: Getting Started , it says

Install/build a compatible version. Hive root pom.xml's defines what version of Spark it was built/tested with.

I checked the pom.xml, it shows that spark version is 1.6.0.

<spark.version>1.6.0</spark.version>

But Hive on Spark: Getting Started also says that

Prior to Spark 2.0.0: ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

Since Spark 2.0.0: ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"

So now I'm confused because I am running hadoop 2.7.3. Do I have to downgrade my hadoop to 2.4?

Which version of Spark should I use? 1.6.0 or 2.0.0?

Thank you!

Top.Deck
  • 1,077
  • 3
  • 16
  • 31

2 Answers2

0

I am currently using spark 2.0.2 with hadoop 2.7.3 and hive 2.1 and it's working fine. And I think hive will support both version of spark 1.6.x and 2.x but I will suggest you to go with spark 2.x since it's the latest version.

Some motivational link for why to use spark 2.x https://docs.cloud.databricks.com/docs/latest/sample_applications/04%20Apache%20Spark%202.0%20Examples/03%20Performance%20Apache%20(Spark%202.0%20vs%201.6).html

Apache Spark vs Apache Spark 2

Community
  • 1
  • 1
siddhartha jain
  • 1,006
  • 10
  • 16
  • I tried Spark 1.6.0 and it's working. I will test Spark 2.0.2 as you suggested. – Top.Deck Feb 17 '17 at 15:33
  • Let me know if you face any issues. – siddhartha jain Feb 17 '17 at 15:47
  • 2
    @siddharthajain, could you please share your detailed steps of config Hive running on Spark? I'm trying to running Hive (2.1.1) on Spark(2.1.0) but failed. I start Spark in standalone mode, and start hive with command: hive --auxpath $HOME/Tools/spark-2.1.0-bin-hadoop2.7/jars/, set hive with commands: set hive.execution.engine=spark;set spark.master=spark://10.0.0.26:7077; hive> set spark.eventLog.enabled=true; hive> set spark.eventLog.dir=/tmp/hive-shizhz/spark/; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer; – shizhz Feb 24 '17 at 08:04
  • 1
    @siddharthajain But when I always got the error from Hive query: FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. – shizhz Feb 24 '17 at 08:09
  • 1
    Seems it's caused by: java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener – shizhz Feb 24 '17 at 08:36
  • @shizhz I get the same Failed to create spark client. I'm working on it >_ – Top.Deck Feb 24 '17 at 15:10
  • @Top.Deck, finally got these two working together, it's Hive 2.1.1 with Spark 1.6.0, Hadoop 2.6.5. – shizhz Feb 26 '17 at 10:47
  • @shizhz congrats! I've got it work with Spark 1.6.0, but I wanna try Spark 2.1.0 with Hive... – Top.Deck Feb 26 '17 at 16:15
  • @Top.Deck, guess we have to wait for Hive 2.2, the spark.version in pom.xml of Hive master if 2.0. Anyway, if you work it out with Hive 2.1, please let me know how. Thanks. – shizhz Feb 27 '17 at 07:05
0

The current version of Spark 2.X is not compatible with Hive 2.1 and Hadoop 2.7, there is a major bug:

JavaSparkListener is not available and Hive crash on execution

https://issues.apache.org/jira/browse/SPARK-17563

You can try to build Hive 2.1 with Hadoop 2.7 and Spark 1.6 with:

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided" 

If you take a look to the command after 2.0 the difference is that ./make-distribution is inside the folder /dev.

If it does not work for hadoop 2.7.X, I can confirm you that I have been able to successfully built it with Hadoop 2.6, by using:

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided" 

and for scala 2.10.5

chuseuiti
  • 783
  • 1
  • 9
  • 32