2

I am trying to configure a 5 node cassandra cluster to run Spark/Shark to test out some Hive queries. I have installed Spark, Scala, Shark and configured according to Amplab [Running Shark on a cluster] https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster.

I am able to get into the Shark CLI and when I try to create an EXTERNAL TABLE out of one of my Cassandra ColumnFamily tables, I keep getting this error

Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.cassandra.CassandraStorageHandler

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I have configured HIVE_HOME, HADOOP_HOME, SCALA_HOME. Perhaps I'm pointing HIVE_HOME and HADOOP_HOME to the wrong paths? HADOOP_HOME is set to my Cassandra hadoop folder (/etc/dse/cassandra), HIVE_HOME is set to the unpacked Amplad download of Hadoop1/hive, and I have also set HIVE_CONF_DIR to my Cassandra Hive path (/etc/dse/hive). Am I missing any steps? Or have I configured these locations wrongly? Any ideas please? Any help will be very much appreciated. Thanks

Community
  • 1
  • 1
kwasbob
  • 47
  • 2
  • 6
  • Where did you put the Cassandra storage handler jar? You might need to add it with the 'add jar' command in shark. – Richard Nov 15 '13 at 11:07
  • Thanks for the reply Richard. I have searched the whole of one Cassandra node for the Storage Handler jar file but I can't find one. It must exist because I can run Hive queries invoked at the Hive CLI using Datastax. is that file called any other than 'StorageHandler.jar'? – kwasbob Nov 15 '13 at 11:45
  • You could inspect which jars Hive loads when it runs under DSE. Then you can use 'add jar' or copy it to the hive lib dir that you are running. Alternatively, you can build the open source cassandra storage handler from https://github.com/milliondreams/hive. – Richard Nov 15 '13 at 12:00

1 Answers1

1

Yes, I have got it.

Try https://github.com/2013Commons/hive-cassandra

whick is working with cassandra 2.0.4, hive 0.11, hadoop 2.0

ahll
  • 2,329
  • 1
  • 19
  • 22