0

I have a three node spark cluster and three node ignite cluster. Spark version: 2.3 Ignite version: 2.7

This is how I set the classpath in Spark's spark-default.conf:

spark.driver.extraClassPath /home/user/apache-ignite-2.7.0-bin/libs/*:/home/user/apache-ignite-2.7.0-bin/libs/ignite-indexing/*:/home/user/apache-ignite-2.7.0-bin/libs/optional/ignite-spark/*:/home/user/apache-ignite-2.7.0-bin/libs/ignite-spring/*

In my Spark (Java) code, I am creating a dataframe and writing to Ignite like this:

df.write()
.format(IgniteDataFrameSettings.FORMAT_IGNITE())
.option(IgniteDataFrameSettings.OPTION_CONFIG_FILE(), confPath)
.option(IgniteDataFrameSettings.OPTION_TABLE(), tableName)
.mode(SaveMode.Append)
.option(IgniteDataFrameSettings.OPTION_CREATE_TABLE_PRIMARY_KEY_FIELDS(), primaryKey)
.option(IgniteDataFrameSettings.OPTION_CREATE_TABLE_PARAMETERS(), "template=partitioned")
.save();

I am getting the following error in Spark:

java.lang.ClassNotFoundException: Failed to find data source: ignite. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)

Caused by: java.lang.ClassNotFoundException: ignite.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)

What else should I do to resolve this issue? Any help is highly appreciated. Thank you.

user3190018
  • 890
  • 13
  • 26
Sam
  • 133
  • 3
  • 12

2 Answers2

0

Update : as mentioned in the ignite deployment docs you should also mention executor classpath along with driver classpath

spark.executor.extraClassPath /opt/ignite/libs/:/opt/ignite/libs/optional/ignite-spark/:/opt/ignite/libs/optional/ignite-log4j/:/opt/ignite/libs/optional/ignite-yarn/:/opt/ignite/libs/ignite-spring/*

I think this is the real issue.


http://apache-ignite-users.70518.x6.nabble.com/Spark-Ignite-connection-using-Config-file-td21827.html

seems like you have to lower version of ignite.

For ignite 2.6:

<dependency> 
    <groupId>org.apache.ignite</groupId> 
    <artifactId>ignite-spark</artifactId> 
    <version>2.6.0</version> 
</dependency> 

You can see (source):

  <dependency> 
      <groupId>org.apache.spark</groupId> 
      <artifactId>spark-core_2.11</artifactId> 
      <version>2.3.0</version> 
      <scope>compile</scope> 
    </dependency> 

Also see
1) IGNITE-8534 they fixed in 2.6 version of Ignite
2) Discussion-Upgrade-Ignite-Spark-Module-s-Spark-version-to-2-3-0

call the below func in your driver which will give all the classpath entries to debug what jars in your classpath. In this ignite-spark jar should be present at runtime

Caller would be...

val  urls = urlsinclasspath(getClass.getClassLoader).foreach(println)


def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
    case null => Array()
    case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
    case _ => urlsinclasspath(cl.getParent)
  }

If you want to add jar dependencies with out using wildcards you can see my answer which will add all the jars from a folder dynamically to path you provided.

Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

you should have the above mentioned ignite-spark jar in this folder. /home/user/apache-ignite-2.7.0-bin/libs/optional/ignite-spark/* use the above approach and add the jar by folder.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Lowering to Ignite 2.6 did not help. I am still getting the same error message. – Sam May 30 '19 at 06:34
  • seems like you have classpath issues with respect to folders you mentioned in the spark-defaults.conf please check that and also mention your spark submit to validate any wrong entries are present in that. – Ram Ghadiyaram May 30 '19 at 14:48
  • 1
    Ram, this is clearly classpath issue and I used your suggestion to print the classpath entries and ignite related classes are not in it. I followed what ignite's official documents say and I could not make it run. – Sam May 31 '19 at 02:13
  • 1
    As a workaround (though I do not like this approach), I copied all jars from ignite's libs and subdirectories into the jars directory of spark (excluding those that are already in spark) and everything seems to work fine. This proves that the issue is with these jars not getting loaded in classpath. I will continue to work and resolve and post here. Thank you all for your great help. – Sam May 31 '19 at 02:15
0

This error means that you don't have next in resources:

META-INF.services/org.apache.spark.sql.sources.DataSourceRegister

That should be a part of ignite-spark dependency.

So what you should check:

1)That ignite-spark-2.7.0.jar exists in the classpath of all your nodes where you have Spark nodes.

2)In case if you use spark.driver.extraClassPath then please check that:

a. You do it client mode (--deploy-mode client) because Spark fires up a Netty HTTP server which distributes the files on startup for each of the worker nodes. In cluster mode spark selected a leader Worker node to execute the Driver process on. This means the job isn't running directly from the Master node.

b. I am not sure but looks like extraClassPath requires the list of jar files instead of /path/to/lib/*. You can try to use next:

EXECUTOR_PATH=""
   for eachjarinlib in $JARS ; do
if [ "$eachjarinlib" != "APPLICATIONJARTOBEADDEDSEPERATELY.JAR" ]; then
       EXECUTOR_PATH=file:$eachjarinlib:$EXECUTOR_PATH
fi
done
spark-submit --deploy-mode client --master yarn --conf "spark.driver.extraClassPath=$EXECUTOR_PATH" --class $EXAMPLE_CLASS $PATH_TO_JAR

where $JARS is a path to your libs.