How to execute Hive queries on Hive 2.1.1 on Spark 2.2.0?

Question

Simple queries, e.g. select, work fine, but when I use aggregate functions, e.g. count, I face errors.

I use beeline to connect to Hive 2.1.1 with Spark 2.2.0 and Hadoop 2.8.

hive-site.xml is as follows:

<property>
    <name>hive.execution.engine</name>
    <value>spark</value>
    <description>
      Expects one of [mr, tez, spark].
      Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
      remains the default engine for historical reasons, it is itself a historical engine
      and is deprecated in Hive 2 line. It may be removed without further warning.
    </description>
  </property>
<property>
<name>spark.master</name>
<value>spark://master:7077</value>
<description>Spark Master URL</description>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
<description>Spark Event Log</description>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs://master:8020/user/spark/eventLogging</value>
<description>Spark event log folder</description>
</property>
<property>
<name>spark.executor.memory</name>
<value>512m</value>
<description>Spark executor memory</description>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
<description>Spark serializer</description>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://master:9000:/user/spark/spark-jars/*</value>
</property>
<property>
<name>spark.master</name>
<value>spark://master:7077</value>
<description>Spark Master URL</description>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
<description>Spark Event Log</description>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs://master:8020/user/spark/eventLogging</value>
<description>Spark event log folder</description>
</property>
<property>
<name>spark.executor.memory</name>
<value>512m</value>
<description>Spark executor memory</description>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
<description>Spark serializer</description>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://master:9000:/user/spark/spark-jars/*</value>
</property>

When executing select count(*) from table in hive I get the below error:

WARN thrift.ThriftCLIService: Error executing statement:
org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoClassDefFoundError: scala/collection/Iterable
        at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:225) ~[hive-service-2.1.1.jar:2.1.1]
        at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276) ~[hive-service-2.1.1.jar:2.1.1]
        at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) ~[hive-service-2.1.1.jar:2.1.1]
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499) ~[hive-service-2.1.1.jar:2.1.1]
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486) ~[hive-service-2.1.1.jar:2.1.1]
        at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295) ~[hive-service-2.1.1.jar:2.1.1]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506) [hive-service-2.1.1.jar:2.1.1]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
        at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1412) [hive-jdbc-2.1.1.jar:2.1.1]
        at com.sun.proxy.$Proxy35.ExecuteStatement(Unknown Source) [?:?]
        at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:308) [hive-jdbc-2.1.1.jar:2.1.1]
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250) [hive-jdbc-2.1.1.jar:2.1.1]

Can you execute any simpler query? What tool do you use to execute queries? beeline? Is this some kind of Hadoop distribution like Cloudera's CDH or HDP or MapR's? — Jacek Laskowski, Jan 03 '18 at 09:41
Simple queriers like select are working. But when we use aggregate functions in query , we are facing error. Yes we use beeline to connect and using Apache Hadoop — chaithanyaa mallamla, Jan 03 '18 at 09:55
How did you configure Hive on Spark? Any tutorials, blog posts or other documents? The error says that it can't find Scala classes that tells me that Hive on Spark is misconfigured and not ready to execute queries using Spark as the query engine. — Jacek Laskowski, Jan 03 '18 at 10:50
Followed https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started But here says Prior to Hive 2.2.0, link the spark-assembly jar to HIVE_HOME/lib. There is no spark-assembly jar in hive 2.1.1 https://stackoverflow.com/questions/42373745/hive-on-spark-missing-spark-assembly-jar Help me which versions are compatible both apache hive and apache spark — chaithanyaa mallamla, Jan 03 '18 at 15:09
select count(*) from student; Query ID = hadoop_20180208184224_f86b5aeb-f27b-4156-bd77-0aab54c0ec67 Total jobs = 1 Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=1) — chaithanyaa mallamla, Feb 08 '18 at 13:21
Can you edit your question and add the other comment and the exception. I think what they refer to as `spark-assembly.jar` is the assembly from Spark and is not shipped with Hive. — Jacek Laskowski, Feb 08 '18 at 14:53

How to execute Hive queries on Hive 2.1.1 on Spark 2.2.0?

0 Answers0