how to connect spark to hive table-spark hive

Question

I have been struggling with this for quite a lot of time.

step-1

I create a hive table and loaded data as follows:-

create external table if not exists productstorehtable2
(
device  string,
date  string,
word  string,
count  int
)
row format delimited fields terminated by ','
location 'hdfs://quickstart.cloudera:8020/user/cloudera/hadoop/hive/warehouse/VerizonProduct2'; 

LOAD DATA INPATH 'hdfs://quickstart.cloudera:8020/user/cloudera/hadoop/input/productstore' INTO TABLE productstorehtable2;

step-2: I write a simple spark script to check for sanity

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext, Time}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SQLContext
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark
import org.apache.spark.sql.hive._
import org.apache.log4j.{Level, Logger}


import java.util.regex.Pattern
import java.util.regex.Matcher

//import Utilities._


object HivePortStreamer {


  def readFromHiveTable(hivecontext:org.apache.spark.sql.hive.HiveContext)  =
  {

     import hivecontext.implicits._

     //val productDF=hivecontext.sql("select * from productstorehtable2")
     //println(productDF.show())
       println("PRINTING THE HIVE TABLES")
       println(hivecontext.sql("show tables"))
  }




  def main(args: Array[String]) {

    val rootLogger = Logger.getRootLogger()
    rootLogger.setLevel(Level.ERROR)

    // Create the context with a 1 second batch size
    val conf = new SparkConf().setAppName("hivePortStreamer").setMaster("local[*]")
    .set("spark.sql.warehouse.dir", "hdfs://quickstart.cloudera:8020/user/cloudera/hadoop/hive/warehouse/VerizonProduct2")
    //val ssc = new StreamingContext(conf, Seconds(1))
    val sparkcontext=new SparkContext(conf)
    val hivecontext=new org.apache.spark.sql.hive.HiveContext(sparkcontext)  
    readFromHiveTable(hivecontext)    

    sparkcontext.stop()

  }
}

When I try to run this script then it just displays blank. I don't get it; I have given the correct warehouse directory location. The same is the case with 'show databases' command.

Is it some issue with how spark and hive are configured on my system?

I use sbt. I had tried the same code on spark-shell and got the same output.

Edit 1: Spark is unable to discover hive tables. I tried the command

println(hivecontext.sql("create table dummytable(id int)"))

it creates the hive table as expected

Kindly help.

Thanks

background: CentOS, cloudera quickstart VM, spark 2.0

In `step-1`, what Hive metastore did you use? You should use it for the Spark app. — Jacek Laskowski, Dec 26 '16 at 23:27
@JacekLaskowski, both have same hive-site.xml with the following: hive.metastore.uris thrift://127.0.0.1:9083 IP address (or fully-qualified domain name) and port of the metastore host ...... is this correct? — sdinesh94, Dec 26 '16 at 23:30
Did you start your Spark app with the file `hive-site.xml` under `HADOOP_CONF_DIR`? — Jacek Laskowski, Dec 26 '16 at 23:31
@JacekLaskowski, I use sbt and I am not sure where it calls hive-site.xml from? — sdinesh94, Dec 26 '16 at 23:32
Run `sbt -DHADOOP_CONF_DIR=[here-the-directory-with-hive-conf.xml]` — Jacek Laskowski, Dec 26 '16 at 23:36
Can you check if http://stackoverflow.com/a/31993754/1305344 helps? — Jacek Laskowski, Dec 26 '16 at 23:40
Set current project to conf (in build file:/etc/hadoop/conf.pseudo/) sbt (conf)> — sdinesh94, Dec 26 '16 at 23:42
Ok I will check that, but you believe it is an issue with metastore? — sdinesh94, Dec 26 '16 at 23:42
@JacekLaskowski, I did as you instructed but still get an empty for output — sdinesh94, Dec 26 '16 at 23:53
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/131535/discussion-between-bigdatascholar-and-jacek-laskowski). — sdinesh94, Dec 27 '16 at 00:25

how to connect spark to hive table-spark hive

I use sbt. I had tried the same code on spark-shell and got the same output.

0 Answers0