Failed to find data source: ignite

Question

I have a three node spark cluster and three node ignite cluster. Spark version: 2.3 Ignite version: 2.7

This is how I set the classpath in Spark's spark-default.conf:

spark.driver.extraClassPath /home/user/apache-ignite-2.7.0-bin/libs/*:/home/user/apache-ignite-2.7.0-bin/libs/ignite-indexing/*:/home/user/apache-ignite-2.7.0-bin/libs/optional/ignite-spark/*:/home/user/apache-ignite-2.7.0-bin/libs/ignite-spring/*

In my Spark (Java) code, I am creating a dataframe and writing to Ignite like this:

df.write()
.format(IgniteDataFrameSettings.FORMAT_IGNITE())
.option(IgniteDataFrameSettings.OPTION_CONFIG_FILE(), confPath)
.option(IgniteDataFrameSettings.OPTION_TABLE(), tableName)
.mode(SaveMode.Append)
.option(IgniteDataFrameSettings.OPTION_CREATE_TABLE_PRIMARY_KEY_FIELDS(), primaryKey)
.option(IgniteDataFrameSettings.OPTION_CREATE_TABLE_PARAMETERS(), "template=partitioned")
.save();

I am getting the following error in Spark:

java.lang.ClassNotFoundException: Failed to find data source: ignite. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)

Caused by: java.lang.ClassNotFoundException: ignite.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)

What else should I do to resolve this issue? Any help is highly appreciated. Thank you.

check my answer seems like you need to lower version of ignite from 2.7 to 2.6 — Ram Ghadiyaram, May 30 '19 at 05:39
Lowering Ignite version to 2.6 did not help. Still getting the same error message. — Sam, May 30 '19 at 06:43
okay please mention your spark-submit. you are using spark-defaults.conf please check the paths mentioned in that are correct. — Ram Ghadiyaram, May 30 '19 at 14:34
one more thing you are mentioning `spark.driver.extraClassPath` what about `spark.executor.extraClassPath` — Ram Ghadiyaram, May 31 '19 at 03:32

Ram Ghadiyaram · Answer 1 · 2019-05-31T03:36:46.833

Update : as mentioned in the ignite deployment docs you should also mention executor classpath along with driver classpath

spark.executor.extraClassPath /opt/ignite/libs/:/opt/ignite/libs/optional/ignite-spark/:/opt/ignite/libs/optional/ignite-log4j/:/opt/ignite/libs/optional/ignite-yarn/:/opt/ignite/libs/ignite-spring/*

I think this is the real issue.

http://apache-ignite-users.70518.x6.nabble.com/Spark-Ignite-connection-using-Config-file-td21827.html

seems like you have to lower version of ignite.

For ignite 2.6:

<dependency> 
    <groupId>org.apache.ignite</groupId> 
    <artifactId>ignite-spark</artifactId> 
    <version>2.6.0</version> 
</dependency>

You can see (source):

  <dependency> 
      <groupId>org.apache.spark</groupId> 
      <artifactId>spark-core_2.11</artifactId> 
      <version>2.3.0</version> 
      <scope>compile</scope> 
    </dependency>

Also see
1) IGNITE-8534 they fixed in 2.6 version of Ignite
2) Discussion-Upgrade-Ignite-Spark-Module-s-Spark-version-to-2-3-0

call the below func in your driver which will give all the classpath entries to debug what jars in your classpath. In this ignite-spark jar should be present at runtime

Caller would be...

val  urls = urlsinclasspath(getClass.getClassLoader).foreach(println)


def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
    case null => Array()
    case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
    case _ => urlsinclasspath(cl.getParent)
  }

If you want to add jar dependencies with out using wildcards you can see my answer which will add all the jars from a folder dynamically to path you provided.

Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

you should have the above mentioned ignite-spark jar in this folder. /home/user/apache-ignite-2.7.0-bin/libs/optional/ignite-spark/* use the above approach and add the jar by folder.

Lowering to Ignite 2.6 did not help. I am still getting the same error message. — Sam, May 30 '19 at 06:34
seems like you have classpath issues with respect to folders you mentioned in the spark-defaults.conf please check that and also mention your spark submit to validate any wrong entries are present in that. — Ram Ghadiyaram, May 30 '19 at 14:48
Ram, this is clearly classpath issue and I used your suggestion to print the classpath entries and ignite related classes are not in it. I followed what ignite's official documents say and I could not make it run. — Sam, May 31 '19 at 02:13
As a workaround (though I do not like this approach), I copied all jars from ignite's libs and subdirectories into the jars directory of spark (excluding those that are already in spark) and everything seems to work fine. This proves that the issue is with these jars not getting loaded in classpath. I will continue to work and resolve and post here. Thank you all for your great help. — Sam, May 31 '19 at 02:15

score 0 · Answer 2 · answered May 30 '19 at 15:58

This error means that you don't have next in resources:

META-INF.services/org.apache.spark.sql.sources.DataSourceRegister

That should be a part of ignite-spark dependency.

So what you should check:

1)That ignite-spark-2.7.0.jar exists in the classpath of all your nodes where you have Spark nodes.

2)In case if you use spark.driver.extraClassPath then please check that:

a. You do it client mode (--deploy-mode client) because Spark fires up a Netty HTTP server which distributes the files on startup for each of the worker nodes. In cluster mode spark selected a leader Worker node to execute the Driver process on. This means the job isn't running directly from the Master node.

b. I am not sure but looks like extraClassPath requires the list of jar files instead of /path/to/lib/*. You can try to use next:

EXECUTOR_PATH=""
   for eachjarinlib in $JARS ; do
if [ "$eachjarinlib" != "APPLICATIONJARTOBEADDEDSEPERATELY.JAR" ]; then
       EXECUTOR_PATH=file:$eachjarinlib:$EXECUTOR_PATH
fi
done
spark-submit --deploy-mode client --master yarn --conf "spark.driver.extraClassPath=$EXECUTOR_PATH" --class $EXAMPLE_CLASS $PATH_TO_JAR

where $JARS is a path to your libs.

This makes perfect sense. Thanks Andrei. – Sam May 31 '19 at 02:11 — Sam, May 31 '19 at 02:11

Failed to find data source: ignite

2 Answers2