Using IntelliJIdea and Maven, I'm trying to take in a csv table and convert it into a Hive Table (or a parquet would be fine as well for now). This is my current code:
import org.apache.spark.sql.SparkSession
import scala.io.Source
import org.apache.spark.sql.types._
object main extends App{
val spark = SparkSession.builder.master("local").appName("my-spark-app").enableHiveSupport().getOrCreate()
val lines = Source.fromFile("C://share_VB/file_name.csv").getLines.toArray
//val myDF = spark.read.csv("C://share_VB/file_name.csv")
//myDF.write.save("C://Users/my_name/ParquetFiles")
for (line <- lines){
if (!line.isEmpty){
val testcase = line.split(",").toBuffer
println(testcase.head)
println(testcase(1))
testcase.remove(0, 2)
while (testcase.nonEmpty){
println(testcase.head)
println(testcase(1))
testcase.remove(0, 2)
}
}
}
}
the pom.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>seeifthisworks</groupId>
<artifactId>seeifthisworks</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.11.8</scala.version>
<scala.compat.version>2.11</scala.compat.version>
<spark.version>2.2.0.cloudera1</spark.version>
<config.version>1.3.2</config.version>
<scalatest.version>3.0.1</scalatest.version>
<spark-testing-base.version>2.2.0_0.8.0</spark-testing-base.version>
</properties>
<!-- set repositories first !!, so that dependencies use the URL for the repos -->
<repositories>
<repository>
<id>Maven</id>
<url>http://repo1.maven.org/maven2</url>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
It runs perfectly if I comment out the val spark = SparkSession.... However, if I leave it there and I try to run anything, I run into the error:
Error: Unable to initialize main class main
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
But it seems fairly clear that I've imported SparkSession and Maven: org.apache.spark:spark-core_2.11:2.2.0.cloudera1
is in my library so in theory, I think it should work.
Can someone help me pinpoint the problem and explain how to fix this?
EDIT: After removing <scope>provided</scope>
, I now encounter a different error:
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at org.apache.spark.util.Utils$.getCallSite(Utils.scala:1440)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at main$.delayedEndpoint$main$1(main.scala:7)
at main$delayedInit$body.apply(main.scala:6)
at scala.Function0.apply$mcV$sp(Function0.scala:34)
at scala.Function0.apply$mcV$sp$(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App.$anonfun$main$1$adapted(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:389)
at scala.App.main(App.scala:76)
at scala.App.main$(App.scala:74)
at main$.main(main.scala:6)
at main.main(main.scala)