Simple queries, e.g. select
, work fine, but when I use aggregate functions, e.g. count
, I face errors.
I use beeline
to connect to Hive 2.1.1 with Spark 2.2.0 and Hadoop 2.8.
hive-site.xml
is as follows:
<property>
<name>hive.execution.engine</name>
<value>spark</value>
<description>
Expects one of [mr, tez, spark].
Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
remains the default engine for historical reasons, it is itself a historical engine
and is deprecated in Hive 2 line. It may be removed without further warning.
</description>
</property>
<property>
<name>spark.master</name>
<value>spark://master:7077</value>
<description>Spark Master URL</description>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
<description>Spark Event Log</description>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs://master:8020/user/spark/eventLogging</value>
<description>Spark event log folder</description>
</property>
<property>
<name>spark.executor.memory</name>
<value>512m</value>
<description>Spark executor memory</description>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
<description>Spark serializer</description>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://master:9000:/user/spark/spark-jars/*</value>
</property>
<property>
<name>spark.master</name>
<value>spark://master:7077</value>
<description>Spark Master URL</description>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
<description>Spark Event Log</description>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs://master:8020/user/spark/eventLogging</value>
<description>Spark event log folder</description>
</property>
<property>
<name>spark.executor.memory</name>
<value>512m</value>
<description>Spark executor memory</description>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
<description>Spark serializer</description>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://master:9000:/user/spark/spark-jars/*</value>
</property>
When executing select count(*) from table
in hive I get the below error:
WARN thrift.ThriftCLIService: Error executing statement:
org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:225) ~[hive-service-2.1.1.jar:2.1.1]
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276) ~[hive-service-2.1.1.jar:2.1.1]
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) ~[hive-service-2.1.1.jar:2.1.1]
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499) ~[hive-service-2.1.1.jar:2.1.1]
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486) ~[hive-service-2.1.1.jar:2.1.1]
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295) ~[hive-service-2.1.1.jar:2.1.1]
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506) [hive-service-2.1.1.jar:2.1.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1412) [hive-jdbc-2.1.1.jar:2.1.1]
at com.sun.proxy.$Proxy35.ExecuteStatement(Unknown Source) [?:?]
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:308) [hive-jdbc-2.1.1.jar:2.1.1]
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250) [hive-jdbc-2.1.1.jar:2.1.1]