I am trying to do a simple Spark SQL programming in Java. In the program, I am getting data from a Cassandra table, converting the RDD
into a Dataset
and displaying the data. When I run the spark-submit
command, I am getting the error: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging
.
My program is:
SparkConf sparkConf = new SparkConf().setAppName("DataFrameTest")
.set("spark.cassandra.connection.host", "abc")
.set("spark.cassandra.auth.username", "def")
.set("spark.cassandra.auth.password", "ghi");
SparkContext sparkContext = new SparkContext(sparkConf);
JavaRDD<EventLog> logsRDD = javaFunctions(sparkContext).cassandraTable("test", "log",
mapRowTo(Log.class));
SparkSession sparkSession = SparkSession.builder().appName("Java Spark SQL").getOrCreate();
Dataset<Row> logsDF = sparkSession.createDataFrame(logsRDD, Log.class);
logsDF.show();
My POM dependencies are:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>1.6.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
</dependency>
</dependencies>
My spark-submit
command is: /home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar
How do I solve this error? Downgrading to 1.5.2
does not work as 1.5.2
does not have org.apache.spark.sql.Dataset
and org.apache.spark.sql.SparkSession
.