6

My build.sbt file has this:

scalaVersion := "2.10.3"
libraryDependencies += "com.databricks" % "spark-csv_2.10" % "1.1.0"

I am running Spark in standalone cluster mode and my SparkConf is SparkConf().setMaster("spark://ec2-[ip].compute-1.amazonaws.com:7077").setAppName("Simple Application") (I am not using the method setJars, not sure whether I need it).

I package the jar using the command sbt package. Command I use to run the application is ./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar.

On running this, I get this error:

java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv

What's the issue?

kamalbanga
  • 1,881
  • 5
  • 27
  • 46

6 Answers6

3

Use the dependencies accordingly. For example:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.6.1</version>
</dependency>

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>spark-csv_2.10</artifactId>
    <version>1.4.0</version>
</dependency>
Nander Speerstra
  • 1,496
  • 6
  • 24
  • 29
1

Include the option: --packages com.databricks:spark-csv_2.10:1.2.0 but do it after --class and before the target/

claudiaann1
  • 237
  • 3
  • 12
  • 1
    This is actually a valid answer which doesn't need to specify explicit jar location. – deepdive Apr 08 '17 at 10:51
  • 1
    Here's how to do it `spark-submit --packages com.databricks:spark-csv_2.10:1.5.0 target/scala-2.11/sample-project_2.10-1.0.jar` – deepdive Apr 08 '17 at 10:52
0

add --jars option and download the jars below from repository such as search.maven.org

--jars commons-csv-1.1.jar,spark-csv-csv.jar,univocity-parsers-1.5.1.jar \

Use the --packages option as claudiaann1 suggested also works if you have internet access without proxy. If you need to go through proxy, it won't work.

Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
Paul Z Wu
  • 555
  • 1
  • 5
  • 16
0

Here is the example that worked: spark-submit --jars file:/root/Downloads/jars/spark-csv_2.10-1.0.3.jar,file:/root/Downloads/jars/com‌​mons-csv-1.2.jar,file:/root/Downloads/jars/spark-sql_2.11-1.4.1.jar --class "SampleApp" --master local[2] target/scala-2.11/my-proj_2.11-1.0.jar

Venkataramana
  • 103
  • 2
  • 9
0

Use below Command , its working :

spark-submit --class ur_class_name --master local[*] --packages com.databricks:spark-csv_2.10:1.4.0 project_path/target/scala-2.10/jar_name.jar
Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
Raghav
  • 109
  • 1
  • 2
  • 11
-2

Have you tried using the --packages argument with spark-submit? I've run into this issue with spark not respecting the dependencies listed as libraryDependencies.

Try this:

./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 
                   --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar
                   --packages com.databricks:spark-csv_2.10:1.1.0

_

From the Spark Docs:

Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages. All transitive dependencies will be handled when using this command.

https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

dayman
  • 680
  • 5
  • 10