Failed to load class for data source: com.databricks.spark.csv

Question

My build.sbt file has this:

scalaVersion := "2.10.3"
libraryDependencies += "com.databricks" % "spark-csv_2.10" % "1.1.0"

I am running Spark in standalone cluster mode and my SparkConf is SparkConf().setMaster("spark://ec2-[ip].compute-1.amazonaws.com:7077").setAppName("Simple Application") (I am not using the method setJars, not sure whether I need it).

I package the jar using the command sbt package. Command I use to run the application is ./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar.

On running this, I get this error:

java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv

What's the issue?

@kamalbanga what does that mean ? If you don't want to accept any answer, please delete your question ! — eliasah, Aug 02 '16 at 06:09

score 3 · Answer 1 · edited Aug 02 '16 at 07:27

Use the dependencies accordingly. For example:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.6.1</version>
</dependency>

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>spark-csv_2.10</artifactId>
    <version>1.4.0</version>
</dependency>

score 1 · Answer 2 · answered Nov 02 '15 at 18:13

1

Include the option: --packages com.databricks:spark-csv_2.10:1.2.0 but do it after --class and before the target/

answered Nov 02 '15 at 18:13

claudiaann1

237
3
12

1

This is actually a valid answer which doesn't need to specify explicit jar location. – deepdive Apr 08 '17 at 10:51
1

Here's how to do it `spark-submit --packages com.databricks:spark-csv_2.10:1.5.0 target/scala-2.11/sample-project_2.10-1.0.jar` – deepdive Apr 08 '17 at 10:52

score 0 · Answer 3 · edited Mar 06 '16 at 19:51

0

add --jars option and download the jars below from repository such as search.maven.org

--jars commons-csv-1.1.jar,spark-csv-csv.jar,univocity-parsers-1.5.1.jar \

Use the --packages option as claudiaann1 suggested also works if you have internet access without proxy. If you need to go through proxy, it won't work.

edited Mar 06 '16 at 19:51

Charles Menguy

40,830
17
95
117

answered Mar 03 '16 at 05:29

Paul Z Wu

555
1
5
16

score 0 · Answer 4 · answered Mar 14 '16 at 13:27

Here is the example that worked: spark-submit --jars file:/root/Downloads/jars/spark-csv_2.10-1.0.3.jar,file:/root/Downloads/jars/com‌mons-csv-1.2.jar,file:/root/Downloads/jars/spark-sql_2.11-1.4.1.jar --class "SampleApp" --master local[2] target/scala-2.11/my-proj_2.11-1.0.jar

score 0 · Answer 5 · edited Jul 26 '18 at 10:19

0

Use below Command , its working :

spark-submit --class ur_class_name --master local[*] --packages com.databricks:spark-csv_2.10:1.4.0 project_path/target/scala-2.10/jar_name.jar

edited Jul 26 '18 at 10:19

Alper t. Turker

34,230
9
83
115

answered Jul 08 '16 at 15:04

Raghav

109
1
2
11

score -2 · Answer 6 · answered Jul 29 '15 at 23:23

Have you tried using the --packages argument with spark-submit? I've run into this issue with spark not respecting the dependencies listed as libraryDependencies.

Try this:

./bin/spark-submit --master spark://ec2-[ip].compute-1.amazonaws.com:7077 
                   --class "[classname]" target/scala-2.10/[jarname]_2.10-1.0.jar
                   --packages com.databricks:spark-csv_2.10:1.1.0

_

From the Spark Docs:

Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages. All transitive dependencies will be handled when using this command.

https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

Failed to load class for data source: com.databricks.spark.csv

6 Answers6

Linked