I am trying to load a hive table from a spark program. Until now, I used spark shell to load data into a Hive table. After learning that, I wrote a spark program on eclipse which you can see below.
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SaveMode
object SuperSpark {
case class partclass(id:Int, name:String, salary:Int, dept:String, location:String)
def main(argds: Array[String]) {
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
val sparkSession = SparkSession.builder.master("local[2]").appName("Saving data into HiveTable using Spark")
.enableHiveSupport()
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.config("hive.metastore.warehouse.dir", "/user/hive/warehouse")
.config("spark.sql.warehouse.dir", warehouseLocation)
.getOrCreate()
import sparkSession.implicits._
val partfile = sparkSession.read.textFile("partfile")
val partdata = partfile.map(p => p.split(","))
val partRDD = partdata.map(line => partclass(line(0).toInt, line(1), line(2).toInt, line(3), line(4)))
val partDF = partRDD.toDF()
partDF.write.mode(SaveMode.Append).insertInto("parttab")
}
}
The point at which I am confused is,
- Where should I add the database details in the program, like localhost/ip address, portnumber, database name.
- I am using Spark version: 2.1.1, that is what release notes in '/usr/local/spark' says (Spark 2.1.1 built for Hadoop 2.6.4). do I need to use the HiveContext object to interact with Hive tables ?
These are the dependencies in my pom.xml:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
Could anyone tell me how can I proceed further ?