How to create pom.xml for maven using SparkSql and Hive?

Question

I have created a Maven Project for SparkSql and Hive connectivity and written the following example code:

SparkSession spark = SparkSession
            .builder()
            .appName("Java Spark Hive Example")
            .master("local[*]")
            .config("hive.metastore.uris", "thrift://localhost:9083")
            .enableHiveSupport()
            .getOrCreate();
try{
    spark.sql("select * from health").show();
} catch(Exception AnalysisException) {
    System.out.println("table not found");
}

I am using Spark 2.1.0 and Hive 1.2.1

For running the above code, I import the Jar files from the Spark folder and included it in the project. I haven't used Maven Pom.xml for this particular job. But when I am moving to the bigger clusters like on AWS, I need to run my JAR file.

I am not able to run as the Maven is not able to find the dependencies. So I thought of adding the dependencies. I tried this:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>2.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.2.1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.10</artifactId>
    <version>1.2.1</version>
</dependency>

But it didn't work and I am not able to see the output what previously I was getting through adding JAR files.
I want to know whether I did anything wrong, if yes then please suggest me what to do? Even as per Spark instructions from the documentation, how I can add the hive-site.xml and hdfs-site.xml with my project in pom.xml? Currently using IntelliJ. Please let me know what I can do to resolve my issue?

Prasad Khode · Accepted Answer · 2017-03-04T07:56:22.467

0

I see there is a mis-configuration of depencies.

In your maven dependency your spark-sql & spark-hive are of version 1.2.1 but spark-core is of version 2.1.0

Change all the dependencies to same version number and that should work

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>2.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>2.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.10</artifactId>
    <version>2.1.0</version>
</dependency>

spark-core dependency http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/2.1.0 spark-sql dependency http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/2.1.0 spark-hive dependency http://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10/2.1.0

edited Mar 04 '17 at 07:56

answered Mar 04 '17 at 07:36

Prasad Khode

6,602
11
44
59

What about the `hive-site.xml` ? Do I need to put it manually? – Jaffer Wilson Mar 04 '17 at 07:37
yes you need to put `hive-site.xml` file in your project `resources` – Prasad Khode Mar 04 '17 at 07:40
No brother it didn't worked. I tried. It is showing red color on the dependencies. I think the dependencies do not exists with the maven. – Jaffer Wilson Mar 04 '17 at 07:49
those dependencies exists in maven repository, check the links that I have added in answer – Prasad Khode Mar 04 '17 at 07:57
Strange.. I have included the dependencies but see the image you will know: https://ibb.co/mqxgov – Jaffer Wilson Mar 04 '17 at 08:00
1

there is typo in ur `pom.xml` check at line no 26, closing dependency tag is defined twice, remove that line and try to build once again... – Prasad Khode Mar 04 '17 at 08:06

How to create pom.xml for maven using SparkSql and Hive?

1 Answers1

Linked