I have a Java Class that defines my JSON and this class contains some Integer properties. Spark is able to read this JSON and parse it using the above Java class encoder. I am also able to perform normal Spark SQL things.
But, when I try to convert this Spark Dataset into JavaRDD or trying to collect data at the driver using collectAsList or collect, it fails with a compile-time error.
If I convert these Integer properties to Long, things start working.
So, what am I missing here with respect to "Spark-Java-Integers"?
Links like these just tell the solution and but does not give reason to the actual problem.
1) https://issues.apache.org/jira/browse/SPARK-12036
2) Spark CSV - No applicable constructor/method found for actual parameters
Below is the code that I tried. It contains 3 files. The first one is the main Spark driver code. The second one is the Person Java class that defines the JSON that I want to parse and work with. The third file is the JSON itself. In the end, I have also included Spark dependency from my pom.xml
File 1: The main Spark Java Driver code
package com.suraj.spark;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
import com.suraj.spark.pojos.Person;
public class PersonTest {
public static void main(String... args) {
SparkSession spark = SparkSession.builder().master("local").appName("Simple Application")
.getOrCreate();
Encoder<Person> pe = Encoders.bean(Person.class);
Dataset<Person> pdf = spark.read().json("person.json").as(pe);
pdf.show(); // This works
JavaRDD<Person> prdd = pdf.toJavaRDD();
System.out.println(prdd.take(1)); // This fails.
}
}
File 2: The person class that defines my JSON
package com.suraj.spark.pojos;
import java.io.Serializable;
public class Person implements Serializable {
private Integer age;
private String name;
private Double height;
public Integer getAge() {
return age;
}
public void setAge(Integer age) {
this.age = age;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Double getHeight() {
return height;
}
public void setHeight(Double height) {
this.height = height;
}
}
File 3: The file containing my JSON string. Keep this at project root.
{"name":"ravi","age":14,"height":6.4}
{"name":null,"age":12,"height":null}
File 4: Excerpts from my pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.4</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.4</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.apache.curator</groupId>
<artifactId>apache-curator</artifactId>
</exclusion>
</exclusions>
</dependency>
Expected = This should be able to print the output of take(1) method.
Actual Result = This is failing with a compile-time error with below stack
Caused by: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 57, Column 37: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 57, Column 37: No applicable constructor/method found for actual parameters "long"; candidates are: "public static java.lang.Integer java.lang.Integer.valueOf(int)", "public static java.lang.Integer java.lang.Integer.valueOf(java.lang.String) throws java.lang.NumberFormatException", "public static java.lang.Integer java.lang.Integer.valueOf(java.lang.String, int) throws java.lang.NumberFormatException"