0

I am creating Spark application with scala, and it is Maven Project. If Possible may Someone can share POM file. My application is only having SPARKSQL.

Do i need to set HADOOP_HOME to the directory containing winutils.exe as i have not added in the config part of the code.

My POM file looks like:-

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>
    <groupId>net.martinprobson.spark</groupId>
    <artifactId>spark_example</artifactId>
    <version>1.0-SNAPSHOT</version>
    <name>${project.artifactId}</name>
    <description>Spark Batch And Streaming Application</description>
    <inceptionYear>2019</inceptionYear>

    <properties>
        <scala.version>2.11</scala.version>
        <scala.full.version>2.11.8</scala.full.version>
        <spark.version>2.4.4</spark.version>
        <java.version>1.8</java.version>
        <jackson.version>2.6.5</jackson.version>
        <scala.maven.plugin.version>3.2.2</scala.maven.plugin.version>
        <maven.surefire.plugin.version>2.13</maven.surefire.plugin.version>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <!--Scala dependencies-->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.full.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.2</version>
        </dependency>


        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>2.4.4</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>2.4.4</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-yarn -->

        <dependency>
            <groupId>com.typesafe</groupId>
            <artifactId>config</artifactId>
            <version>1.3.0</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-deploy-plugin</artifactId>
                <version>2.7</version>
            </plugin>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.6</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <scalaVersion>${scala.version}</scalaVersion>
                </configuration>
            </plugin>
            <!-- disable surefire -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.12.4</version>
                <configuration>
                    <skipTests>true</skipTests>
                </configuration>
            </plugin>
            <!-- enable scala test -->
            <plugin>
                <groupId>org.scalatest</groupId>
                <artifactId>scalatest_2.10</artifactId>
                <version>2.2.6</version>

                <configuration>
                    <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
                    <junitxml>.</junitxml>
                    <filereports>TestSuite.txt</filereports>
                </configuration>
                <executions>
                    <execution>
                        <id>test</id>
                        <goals>
                            <goal>test</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <executions>
                    <execution>
                        <id>jar-with-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

My Scala code looks like

package Batch

import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
import org.apache.spark.sql.functions._
Object BatchJob {
  def main(args: Array[String]) {
    val spark = SparkSession.builder
      .master("local")
      .appName("Fraud Detector")
      .config("spark.driver.memory", "2g")
      .enableHiveSupport
      .getOrCreate()

    import spark.implicits._
    val financesDF = spark.read.json("Data/finances-small.json")


  }
}

But getting Error as

**Cannot Resolve Symbol Apache

cannot Resolve Symbol Savemode

cannot Resolve Symbol SparkSession**

Is any problem with POM........ Highly Appreciate suggestion.

Kind Regards

DataQuest5
  • 59
  • 7
  • 1
    Change `Class BatchJob to` object and make sure to use the same version for `spark-core` `sparkj-sql` and others. – koiralo Jun 28 '20 at 12:32
  • @koiralo: i did the change but still the same error. i have edited the question with modified code. Kindly share your thoughts – DataQuest5 Jun 28 '20 at 13:11
  • May someone have working POM file and able to share, kindly help. I greatly appreciate your help – DataQuest5 Jun 28 '20 at 13:22
  • Your pom file looks fine form me, Make sure you have the correct folder structure for maven project. as here https://stackoverflow.com/a/45540348/6551426 – koiralo Jun 28 '20 at 15:48

1 Answers1

0
you can simply replace your build tag with below. It worked for me.
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
    <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.3.2</version>
        <executions>
            <execution>
                <id>scala-compile-first</id>
                <phase>process-resources</phase>
                <goals>
                    <goal>add-source</goal>
                    <goal>compile</goal>
                </goals>
            </execution>
            <execution>
                <id>scala-test-compile</id>
                <phase>process-test-resources</phase>
                <goals>
                    <goal>testCompile</goal>
                </goals>
            </execution>
        </executions>
    </plugin>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.3.2</version>
        <executions>
            <execution>
                <phase>compile</phase>
                <goals>
                    <goal>compile</goal>
                </goals>
            </execution>
        </executions>
    </plugin>
</plugins>
</build>
ajosh97
  • 32
  • 4
  • Thanks a lot aJosh97 There is only 1 issue getting error as "cannot resolve file scala". i am downloading scala through intellij plugins. Am i need to change any folder structure... kindly suggest – DataQuest5 Jun 28 '20 at 15:49
  • In your project structure please look for src/main/ and src/test directory where default java folder is created. Please rename java with scala in both the places and it would work for you. Please upvote it if it's worked for you – ajosh97 Jun 28 '20 at 16:16