0

Installed Spark 1.5 spark-1.5.0-bin-hadoop2.6 on my local machine. Ran $ ./bin/spark-shell Tried, following the doc to create a table, getting this:

> SQL context available as sqlContext.
> 
> scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value
> STRING)"); 15/09/22 22:18:13 ERROR DDLTask:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:file:/user/hive/warehouse/src is not a directory
> or unable to create one)  at
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)

Tried passing the hive parameter for this, but didnt work:

> $  ./bin/spark-shell --conf hive.metastore.warehouse.dir=./ Warning:
> Ignoring non-spark config property: hive.metastore.warehouse.dir=./

Finally tried the CLI itself, but getting the same issue. Where do i change the hive warehouse parameter location ? I dont have Hadoop installed at the moment, nor hive.

thanks, Matt

Emzor
  • 1,380
  • 17
  • 28
MattLieber
  • 53
  • 3
  • 11
  • I am running into the same error. How did you resolve this? I am not sure if i follow the answer below. – G3M Oct 06 '15 at 22:18
  • actually i didn't, i figured i needed a Hive install. However, re-reading the doc, it says "To use a HiveContext, you do not need to have an existing Hive setup" from https://spark.apache.org/docs/1.5.0/sql-programming-guide.html so i am definitely confused.. – matthieu lieber Oct 08 '15 at 01:03

3 Answers3

0

Metadata of hive tables are strored in metastore, hive context adds support for finding tables in the MetaStore.

import org.apache.spark.sql.hive.HiveContext
val hiveContext = new HiveContext(sc)
val myDF = sql("select * from mytable")

You will get dataFrame as result

myDF: org.apache.spark.sql.DataFrame = [.....]
WoodChopper
  • 4,265
  • 6
  • 31
  • 55
  • 2
    Thanks - however, on the 2nd line i get: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@7a92934 Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /Users/mlieber/app/spark-1.5.0-bin-hadoop2.6/metastore_db. – MattLieber Sep 23 '15 at 15:37
  • My bad, I missed to read the complete question - "No hive". Spark-SQL on its own is not a DW/DB it is query/computation engine. But you can create Dataframe save and query or query a JSON file and few other format. – WoodChopper Sep 24 '15 at 04:19
0

I met this problem when spark-shell didn't have the access right to write to /user/hive/warehouse

  1. sudo spark-shell to have another try. If it works, do the second step.
  2. change access right of the dir and make it is same with spark-shell command.
Li Ping
  • 121
  • 1
  • 8
0

Actually you don't really have to got Hive installed (nor Hadoop, but you need to get a hive-site.xml presented in your spark class path ( simplest way to add hive-site.xml to your spark conf directory)

here is a simple default hive-site.xml

<configuration>
<property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:derby:;databaseName=/PATH/TO/YOUR/METASTORE/DIR/metastore_db;create=true</value>
   <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
   <name>javax.jdo.option.ConnectionDriverName</name>
   <value>org.apache.derby.jdbc.EmbeddedDriver</value>
   <description>Driver class name for a JDBC metastore</description>
</property>

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>PATH/TO/YOUR/WAREHOSUE/DIR/</value>
    <description>location of default database for the warehouse</description>
</property>
</configuration>

Some times, when the metastore is local derby database, it might have locks which has not been deleted, if you are experienceing a problem about metstore locks, you could delete the locks (make sure it is just you who is using the metastore first ;) ) :

$ rm  /PATH/TO/YOUR/METASTORE/DIR/metastore_db/*.lck
user1314742
  • 2,865
  • 3
  • 28
  • 34