0

I am trying to access hive tables from eclipse maven project with scala nature.

I tried using hive context to get hive database details as shown below, but facing the error below.
I can execute the below code in spark-shell CLI but unable to perform same in eclipse scala ide adding maven dependencies.

Below is my code:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._

object readHiveTable {
  def main(args: Array[String]){
    val conf = new SparkConf().setAppName("Read Hive Table").setMaster("local")
    conf.set("spark.ui.port","4041")
    val sc = new SparkContext(conf)
    //val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val hc = new HiveContext(sc)
    hc.setConf("hive.metastore.uris","thrift://127.0.0.1:9083")
    hc.sql("use default")
    val a = hc.sql("show tables")
    a.show
  }
}

Below is the error I am facing in my console window:

18/02/04 19:58:15 INFO SparkUI: Started SparkUI at http://192.168.0.10:4041
18/02/04 19:58:15 INFO Executor: Starting executor ID driver on host localhost
18/02/04 19:58:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36099.
18/02/04 19:58:15 INFO NettyBlockTransferService: Server created on 36099
18/02/04 19:58:15 INFO BlockManagerMaster: Trying to register BlockManager
18/02/04 19:58:15 INFO BlockManagerMasterEndpoint: Registering block manager localhost:36099 with 744.4 MB RAM, BlockManagerId(driver, localhost, 36099)
18/02/04 19:58:15 INFO BlockManagerMaster: Registered BlockManager
18/02/04 19:58:17 INFO HiveContext: Initializing execution hive, version 1.2.1
18/02/04 19:58:17 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
18/02/04 19:58:17 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
18/02/04 19:58:17 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/02/04 19:58:17 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
18/02/04 19:58:17 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
18/02/04 19:58:17 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/02/04 19:58:17 INFO ObjectStore: ObjectStore, initialize called
18/02/04 19:58:17 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/02/04 19:58:17 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/02/04 19:58:28 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:39 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/02/04 19:58:39 INFO ObjectStore: Initialized ObjectStore
18/02/04 19:58:40 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/02/04 19:58:40 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/02/04 19:58:41 INFO HiveMetaStore: Added admin role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: Added public role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_all_databases
18/02/04 19:58:41 INFO audit: ugi=chaithu   ip=unknown-ip-addr  cmd=get_all_databases   
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/02/04 19:58:41 INFO audit: ugi=chaithu   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
18/02/04 19:58:41 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
    at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
    at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
    at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
    at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
    at com.CITIGenesis.readHiveTable$.main(readHiveTable.scala:13)
    at com.CITIGenesis.readHiveTable.main(readHiveTable.scala)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
    at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
    ... 12 more
18/02/04 19:58:43 INFO SparkContext: Invoking stop() from shutdown hook
18/02/04 19:58:43 INFO SparkUI: Stopped Spark web UI at http://192.168.0.10:4041
18/02/04 19:58:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/02/04 19:58:43 INFO MemoryStore: MemoryStore cleared
18/02/04 19:58:43 INFO BlockManager: BlockManager stopped
18/02/04 19:58:43 INFO BlockManagerMaster: BlockManagerMaster stopped
18/02/04 19:58:43 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
18/02/04 19:58:43 INFO SparkContext: Successfully stopped SparkContext
18/02/04 19:58:43 INFO ShutdownHookManager: Shutdown hook called
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ec5892a-1d53-4721-b770-d16e8757865d
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ca97c02-57c7-400b-b552-44f6d7813da5

HDFS directory:

chaithu@localhost:~$ hadoop fs -ls /tmp
Found 3 items
d---------   - hdfs   supergroup          0 2018-02-04 14:15 /tmp/.cloudera_health_monitoring_canary_files
drwxrwxrwx   - hdfs   supergroup          0 2018-01-31 11:42 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2018-01-31 11:25 /tmp/logs
chaithu@localhost:~$ hadoop fs -ls /user/
Found 6 items
drwxrwxrwx   - chaithu supergroup          0 2018-02-04 19:34 /user/chaithu
drwxrwxrwx   - mapred  hadoop              0 2018-01-31 11:25 /user/history
drwxrwxr-t   - hive    hive                0 2018-01-31 11:31 /user/hive
drwxrwxr-x   - hue     hue                 0 2018-01-31 11:38 /user/hue
drwxrwxr-x   - oozie   oozie               0 2018-01-31 11:34 /user/oozie
drwxr-x--x   - spark   spark               0 2018-01-31 22:39 /user/spark
James Z
  • 12,209
  • 10
  • 24
  • 44
Ku002
  • 117
  • 1
  • 2
  • 14

1 Answers1

0

for Hadoop version 2.2.0

Assuming that's the Spark version, you should be using SparkSession and using enableHiveSupport(), then spark.sql method will operate like in the spark shell.

HIve/SQLContext are there only for backwards compatibility. New Spark code should not use them.

underlying DB is DERBY

To me, this line means that either

  1. Hive is using the default metastore configuration
  2. Spark is not connected to the metastore, and has created a local derby database. This corresponds to Failed to get database default

In the latter case, check the /tmp folder of the local filesystem

See various solutions here to see how you can connect to the metastore

How to connect to a Hive metastore programmatically in SparkSQL?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245