6

I am trying to do a simple Spark SQL programming in Java. In the program, I am getting data from a Cassandra table, converting the RDD into a Dataset and displaying the data. When I run the spark-submit command, I am getting the error: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging.

My program is:

SparkConf sparkConf = new SparkConf().setAppName("DataFrameTest")
        .set("spark.cassandra.connection.host", "abc")
        .set("spark.cassandra.auth.username", "def")
        .set("spark.cassandra.auth.password", "ghi");
SparkContext sparkContext = new SparkContext(sparkConf);
JavaRDD<EventLog> logsRDD = javaFunctions(sparkContext).cassandraTable("test", "log",
        mapRowTo(Log.class));
SparkSession sparkSession = SparkSession.builder().appName("Java Spark SQL").getOrCreate();
Dataset<Row> logsDF = sparkSession.createDataFrame(logsRDD, Log.class);
logsDF.show();

My POM dependencies are:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.0.2</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector_2.11</artifactId>
        <version>1.6.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>   
</dependencies>

My spark-submit command is: /home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar

How do I solve this error? Downgrading to 1.5.2 does not work as 1.5.2 does not have org.apache.spark.sql.Dataset and org.apache.spark.sql.SparkSession.

khateeb
  • 5,265
  • 15
  • 58
  • 114
  • 1
    @T.Gawęda The solution there does not work for me because downgrading to 1.5.2 as 1.5.2 does not have `org.apache.spark.sql.Dataset` and `org.apache.spark.sql.SparkSession`. – khateeb Dec 06 '16 at 12:44
  • Please check connector version 2.0 - see https://github.com/datastax/spark-cassandra-connector – T. Gawęda Dec 06 '16 at 13:16
  • @T.Gawęda Connector 2.0 is still in beta. I used it and I get this error: `NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)NullPointerException at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465) at org.apache.spark.sql.SparkSession.getSchema(SparkSession.scala:673) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:340) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:359) at com.jtv.spark.dataframes.App.main(App.java:25)` – khateeb Dec 06 '16 at 13:26
  • But Connector 1.6 does not support Spark 2.x. This error means you've got wrong guava version. Run `mvn dependency:tree` and find where you've got conflicts – T. Gawęda Dec 06 '16 at 13:27
  • @T.Gawęda I get this error when I use Connector 2.0.0-M3 not 1.6. I had used Connector 1.6 with Spark 2.0 in other programs. The problem starts when I use Spark SQL packages. – khateeb Dec 06 '16 at 13:30
  • In pom.xml you have `1.6.3` ;) Spark uses Guava and maybe some other lib on the classpath also and there is a version conflict – T. Gawęda Dec 06 '16 at 13:32
  • @T.Gawęda I changed it at your suggestion. I posted the result in the comment which showed a guava version mismatch. – khateeb Dec 06 '16 at 13:34

5 Answers5

1

This may be a problem into your IDE. As some of this packages are created and Scala the Java project, sometimes the IDE is unable to understand what is going on. I am using the Intellij and it keeps displaying this message to me. But, when I try to run the "mvn test" or "mvn package" everything is fine. Please check if this is really some package error or just the IDE that is lost.

Thiago Mata
  • 2,825
  • 33
  • 32
0

Spark Logging is available for Spark version 1.5.2 and lower but not higher version. So your dependency in pom.xml should be like this:

<dependencies>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.2</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.10</artifactId>
    <version>1.5.2</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.10</artifactId>
    <version>1.5.2</version>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.5.2</version>
  </dependency>   
</dependencies>

Please let me know if it works or not.

SachinSarawgi
  • 2,632
  • 20
  • 28
  • Tried it. Didn't work. 1.5.2 does not have `org.apache.spark.sql.Dataset` and `org.apache.spark.sql.SparkSession`. – khateeb Dec 06 '16 at 12:36
  • Then for them you can use updated version and for other the older version. Try it and let me know. – SachinSarawgi Dec 06 '16 at 12:37
  • @Khateeb Hi did you tried the solution what error its showing now. – SachinSarawgi Dec 06 '16 at 12:41
  • Getting error: `[24,57] cannot access org.apache.spark.internal.Logging` – khateeb Dec 06 '16 at 12:42
  • @Khateeb I think their is some problem in spark configuration. Please read http://stackoverflow.com/questions/34108613/why-does-scala-compiler-fail-with-object-sparkconf-in-package-spark-cannot-be-a – SachinSarawgi Dec 06 '16 at 12:46
  • I have been running other programs using the same Spark config with no problem. This problem started when I started to use Spark SQL. – khateeb Dec 06 '16 at 12:48
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/129911/discussion-between-sachinsarawgi-and-khateeb). – SachinSarawgi Dec 06 '16 at 12:50
0

The below dependency worked fine for my case.

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.2.0</version>
    <scope>provided</scope>
</dependency>
Avijit
  • 1,770
  • 5
  • 16
  • 34
0

Pretty late to the party here, but I added

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.11</artifactId>
  <version>2.1.1</version>
  <scope>provided</scope>
</dependency>

To solve this issue. Seems to work for my case.

Brian
  • 857
  • 2
  • 12
  • 25
0

Make sure you have the correct spark version in the pom.xml.

previously, in local, I have a different version of Spark and that is why I was getting the error in IntelliJ IDE. "Can not have access Spark.logging class"

In my case, Changed it from 2.4.2 -> 2.4.3, and it solved.

Spark version & Scala version info, we can get from spark-shell command.

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.4.3</version>
</dependency>
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.4.3</version>
</dependency>
AP-Big Data
  • 182
  • 1
  • 9