2

I'm trying to build a cube on Kylin with Spark as engine type. The cluster contains the following tools:

OS image: 1.0-debian9

Apache Spark 2.4.4 (changed from 1.6.2)

Apache Hadoop 2.7.4

Apache Hive 1.2.1

I'm getting this error while building a cube:

java.lang.NoSuchMethodError: org.apache.hive.common.util.ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V
    at org.apache.hive.hcatalog.common.HiveClientCache.createShutdownHook(HiveClientCache.java:221)
    at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:153)
    at org.apache.hive.hcatalog.common.HiveClientCache.<init>(HiveClientCache.java:97)
    at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:553)
    at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
    at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
    at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
    at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
    at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:80)
    at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126)
    at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104)
    at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:131)
    at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
    at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
    at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
    at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I checked the hive and hadoop library jars directory to see if there are any redundant jars and I found two versions of every type of jar. For example: hive-common-1.2.1.jar and hive-common.jar.

I tried moving either of them to a different location and tried resuming the cube building process. But I got the same error. Any help on this would be greatly appreciated.

Arjun A J
  • 396
  • 1
  • 9
  • 34

2 Answers2

1

This is not supported use case for Dataproc, if you need to use Spark 2.4.4, then you should use Dataproc 1.4 or 1.5 instead of Dataproc 1.0 that comes with Spark 1.6.2.

Aside this, ShutdownHookManager.addShutdownHook(Ljava/lang/Runnable;)V method was added in Hive 2.3.0, but Spark uses fork of the Hive 1.2.1, that's why you need to use Kylin version that supports Hive 1.2.1.

Regarding duplicate jars, version less hive-common.jar is not a duplicate, it's a symbolic link to the versioned hive-common-1.2.1.jar. You can verify this by listing it:

$ ls -al /usr/lib/hive/lib/hive-common.jar
lrwxrwxrwx 1 root root 21 Nov  9 09:20 /usr/lib/hive/lib/hive-common.jar -> hive-common-2.3.6.jar
Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
  • @lgor Dvorzhak, thanks for your quick response. Earlier, I used Dataproc 1.4 Debian image with these specifications: Apache Spark:2.4.4, Apache Hive:2.3.6, Apache Hadoop:2.9.2 While building a cube with this, I got the error: NoSuchMethodError: org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils.convertToProtoFormat I found out the possible reason is the version of Hadoop used to compile the application (2.7) and the version of Hadoop (2.9) in the cluster are different. This is why I went with Dataproc 1.0 yesterday which has Hadoop 2.7.4. Please suggest the right approach to use here. – Arjun A J Dec 18 '19 at 04:39
  • How do you install Kylin? If you are compiling it, you need to override Hadoop, Yarn, Spark and Hive version in [Kylin Maven build](https://github.com/apache/kylin/blob/master/pom.xml#L68:L83) to match Dataproc-provided versions. – Igor Dvorzhak Dec 18 '19 at 06:50
1

I changed the Hive version to 2.1.0 and it worked for me. I decided to install this version of Hive by checking the Kylin download page and in turn going through other cloud platforms like AWS EMR and Microsoft Azure HDInsight for Kylin 2.6.4 release.

Thanks, @Igor Dvorzhak for your valuable suggestions.

Arjun A J
  • 396
  • 1
  • 9
  • 34