2

After following the instructions for spark-atlas-connector. I am getting below error while running simple code to create table in spark

Spark2 2.3.1 Atlas 1.0.0

batch cmd is:

spark-submit --jars /home/user/spark-atlas-connector/spark-atlas-connector-assembly/target/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar
--conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker
--files /home/user/atlas-application.properties
--master local
/home/user/SparkAtlas/test.py

Exception in thread "SparkCatalogEventProcessor-thread" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener at com.hortonworks.spark.atlas.sql.SparkCatalogEventProcessor.process(SparkCatalogEventProcessor.scala:36) at com.hortonworks.spark.atlas.sql.SparkCatalogEventProcessor.process(SparkCatalogEventProcessor.scala:28) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) at scala.Option.foreach(Option.scala:257) at com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anon$1.run(AbstractEventProcessor.scala:38) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Thanks in advance.

user3190018
  • 890
  • 13
  • 26

1 Answers1

1

This is clear indication of jar version mismatches

for the latest atlas version 2.0.0... below are the dependencies

  <spark.version>2.4.0</spark.version>
    <atlas.version>2.0.0</atlas.version>
    <scala.version>2.11.12</scala.version>

For Atlas 1.0.0 see the pom.xml for it... these are dependencies

 <spark.version>2.3.0</spark.version>
    <atlas.version>1.0.0</atlas.version>
    <scala.version>2.11.8</scala.version>

try using the correct versions of jars by seeinng the pom.xml mentioned in the link.

Note :
1) if you add one jar by seeing error and downloading it... and another place you will hit road block. Advise you to use correct versions.
2) Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x). check your scala version as you have not mentioned in the question.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • just check this – Ram Ghadiyaram May 06 '20 at 22:45
  • thanks for you reply. Scala version is 2.11.8 and i tried using the pom.xml from link and the package compilation error out. now try same provided version of pom.xml with changed parameters i.e. matching my HDP versions for spark2, atlas, kafka. Also changed scala to 2.11.8 – Hammad Hasan May 07 '20 at 00:03
  • that try error out too with same error on compile: `Note: implicit value formats is not applicable here because it comes after the application point and it lacks an explicit result type` – Hammad Hasan May 07 '20 at 00:08
  • check your code this compile error is your code based... for [example](https://stackoverflow.com/questions/24273617/compile-error-when-using-a-companion-object-of-a-case-class-as-a-type-parameter?rq=1) you have to bit adjust your code to match api versions you specified – Ram Ghadiyaram May 07 '20 at 00:11
  • following the [link](https://github.com/hortonworks-spark/spark-atlas-connector), it downloads all the required/dependency stuff and compile - it complied previously with default option but shows error when running my code with JAR but with you suggestions, it downloads the dependencies and error out on compiling the jar. I am just running the same command mentioned `mvn package -DskipTests`. – Hammad Hasan May 07 '20 at 00:17
  • do one thing `mvn clean package -DskipTests` you do first. if you are using intellij `re-import` option will be there on maven pane click that – Ram Ghadiyaram May 07 '20 at 00:20
  • tried with same error and below is the last bit of logs: `[WARNING] two warnings found [ERROR] 31 errors found [INFO] Reactor Summary for spark-atlas-connector-main_2.11 0.1.0-SNAPSHOT: [INFO] spark-atlas-connector-main_2.11 SUCCESS [INFO] spark-atlas-connector_2.11 FAILURE [INFO] spark-atlas-connector-assembly . SKIPPED [INFO] BUILD FAILURE [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on project spark-atlas-connector_2.11: Execution default of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.: CompileFailed ` – Hammad Hasan May 07 '20 at 00:27
  • still you have not configured correct version is what i see.. these are api in compatablity issues – Ram Ghadiyaram May 07 '20 at 00:32
  • i have used this [pom.xml](http://s000.tinyupload.com/index.php?file_id=63993220376935748276), let me know if should i change anything from the parameters – Hammad Hasan May 07 '20 at 00:37
  • this is $JAVA_HOME set to `/usr/lib/jvm/java-1.8.0-openjdk` – Hammad Hasan May 07 '20 at 00:40
  • i could not able to open your pom.xml firewall issue – Ram Ghadiyaram May 07 '20 at 00:41
  • downloaded and compiled this https://github.com/hortonworks-spark/spark-atlas-connector/archive/atlas-1.0.zip its working fine – Ram Ghadiyaram May 07 '20 at 00:56
  • download that on top add your versions i think you added some additional jar dependencies which is the cause I guess – Ram Ghadiyaram May 07 '20 at 01:03
  • `import org.apache.atlas.hbase.bridge.HBaseAtlasHook._ ` is one of the compilation error for that dependency in your pom xml is not there `hbase-bridge` -> ` org.apache.atlas hbase-bridge ${atlas.version} ` and also you have 3 kafka dependency where in the orginal project only one ... I could not able to complete list here see carefully and do that – Ram Ghadiyaram May 07 '20 at 01:11
  • downloaded this github.com/hortonworks-spark/spark-atlas-connector/archive/ and add your versions 2.3.1 manually here it should work – Ram Ghadiyaram May 07 '20 at 01:12
  • thank you so much for your help, it started showing the lineage. However, i am seeing one warning while running insert statement from spark: `WARN SparkExecutionPlanProcessor: Caught exception during parsing event java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.AnalysisBarrier cannot be cast to org.apache.spark.sql.catalyst.plans.logical.Project` – Hammad Hasan May 07 '20 at 02:07
  • when you are eligible to vote up dont forget to [vote-up](https://meta.stackexchange.com/a/173400/369717) – Ram Ghadiyaram May 07 '20 at 02:49