0

I am trying to load a hive table from a spark program. Until now, I used spark shell to load data into a Hive table. After learning that, I wrote a spark program on eclipse which you can see below.

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SaveMode

object SuperSpark {
  case class partclass(id:Int, name:String, salary:Int, dept:String, location:String)
  def main(argds: Array[String]) {
    val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
    val sparkSession = SparkSession.builder.master("local[2]").appName("Saving data into HiveTable using Spark")
                        .enableHiveSupport()
                        .config("hive.exec.dynamic.partition", "true")
                        .config("hive.exec.dynamic.partition.mode", "nonstrict")
                        .config("hive.metastore.warehouse.dir", "/user/hive/warehouse")
                         .config("spark.sql.warehouse.dir", warehouseLocation)
                        .getOrCreate()
    import sparkSession.implicits._

    val partfile = sparkSession.read.textFile("partfile")
    val partdata = partfile.map(p => p.split(","))
    val partRDD  = partdata.map(line => partclass(line(0).toInt, line(1), line(2).toInt, line(3), line(4)))
    val partDF   = partRDD.toDF()
    partDF.write.mode(SaveMode.Append).insertInto("parttab")
  }
}

The point at which I am confused is,

  1. Where should I add the database details in the program, like localhost/ip address, portnumber, database name.
  2. I am using Spark version: 2.1.1, that is what release notes in '/usr/local/spark' says (Spark 2.1.1 built for Hadoop 2.6.4). do I need to use the HiveContext object to interact with Hive tables ?

These are the dependencies in my pom.xml:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.1.1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.1.1</version>
    <scope>provided</scope>
</dependency>

Could anyone tell me how can I proceed further ?

Metadata
  • 2,127
  • 9
  • 56
  • 127
  • 3
    Your code isn't compatible with Spark 1.6 and all your pom dependencies point to Spark 2.1.1. This code won't run in Spark 1.6 because you don't have `SparkSession` in 1.6. Answering the second part of your question, yes you need to use `HiveContext`. More details [in this question](https://stackoverflow.com/questions/30664008/how-to-save-dataframe-directly-to-hive) – philantrovert Jun 29 '17 at 06:20
  • @philantrovert I have updated the version details in the question. I got that information from release notes file in the folder '/usr/local/spark' Is my code compatible with version I have mentioned ? In that case, what are the changes needed to be done in the program ? – Metadata Jun 29 '17 at 07:13
  • Yeah, your code is fine if you're using Spark 2.1. Follow the link mentioned in my previous comment and you can find more details on how to save a table to hive there. – philantrovert Jun 29 '17 at 07:21

1 Answers1

0

I think you need to provide metastore uris. You have two options:

  • use hive-site.xml on the resource classpath from where you run your spark application (if you are following a standard maven structure in can place it on the resource folder):

    <configuration>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://192.168.1.134:9083</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    

  • In your spark code, configure your SparkSession object with a property like this:

    .config("hive.metastore.uris", "thrift://192.168.1.134:9083")

dumitru
  • 2,068
  • 14
  • 23