1

I have a Java Class that defines my JSON and this class contains some Integer properties. Spark is able to read this JSON and parse it using the above Java class encoder. I am also able to perform normal Spark SQL things.

But, when I try to convert this Spark Dataset into JavaRDD or trying to collect data at the driver using collectAsList or collect, it fails with a compile-time error.

If I convert these Integer properties to Long, things start working.
So, what am I missing here with respect to "Spark-Java-Integers"?

Links like these just tell the solution and but does not give reason to the actual problem.
1) https://issues.apache.org/jira/browse/SPARK-12036
2) Spark CSV - No applicable constructor/method found for actual parameters

Below is the code that I tried. It contains 3 files. The first one is the main Spark driver code. The second one is the Person Java class that defines the JSON that I want to parse and work with. The third file is the JSON itself. In the end, I have also included Spark dependency from my pom.xml

File 1: The main Spark Java Driver code

package com.suraj.spark;

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;

import com.suraj.spark.pojos.Person;

public class PersonTest {
    public static void main(String... args) {
        SparkSession spark = SparkSession.builder().master("local").appName("Simple Application")
                .getOrCreate();

        Encoder<Person> pe = Encoders.bean(Person.class);

        Dataset<Person> pdf = spark.read().json("person.json").as(pe);
        pdf.show(); // This works

        JavaRDD<Person> prdd = pdf.toJavaRDD();
        System.out.println(prdd.take(1)); // This fails.
    }
}

File 2: The person class that defines my JSON

package com.suraj.spark.pojos;

import java.io.Serializable;

public class Person implements Serializable {

    private Integer age;
    private String name;
    private Double height;

    public Integer getAge() {
        return age;
    }

    public void setAge(Integer age) {
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Double getHeight() {
        return height;
    }

    public void setHeight(Double height) {
        this.height = height;
    }

}

File 3: The file containing my JSON string. Keep this at project root.

{"name":"ravi","age":14,"height":6.4}
{"name":null,"age":12,"height":null}

File 4: Excerpts from my pom.xml

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.4</version>
            <scope>provided</scope>
            <exclusions>
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava</artifactId>
                </exclusion>
            </exclusions>
    </dependency>
    <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.4</version>
            <scope>provided</scope>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.curator</groupId>
                    <artifactId>apache-curator</artifactId>
                </exclusion>
            </exclusions>
    </dependency>

Expected = This should be able to print the output of take(1) method.
Actual Result = This is failing with a compile-time error with below stack

Caused by: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 57, Column 37: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 57, Column 37: No applicable constructor/method found for actual parameters "long"; candidates are: "public static java.lang.Integer java.lang.Integer.valueOf(int)", "public static java.lang.Integer java.lang.Integer.valueOf(java.lang.String) throws java.lang.NumberFormatException", "public static java.lang.Integer java.lang.Integer.valueOf(java.lang.String, int) throws java.lang.NumberFormatException"
surajs21
  • 128
  • 1
  • 8

1 Answers1

0

It's not an Integer you're getting, but a Long.

No applicable constructor/method found for actual parameters "long";

It's trying to find methods like new Integer(long) to convert it, but there aren't any since it's a lossy conversion. Change Person.age to Long and it'll work (easiest option).

Kayaman
  • 72,141
  • 5
  • 83
  • 121
  • Does this mean that Spark while reading JSON using the class that defines age as an Integer, does not respect it and convert it to Long internally? And when we are asking it to convert it to the class object, it fails as it's a lossy conversion. – surajs21 Oct 18 '19 at 07:50
  • @surajs21 depending on the libraries and language used, [there are no integers or longs](https://stackoverflow.com/questions/13502398/json-integers-limit-on-size). But for example Jackson treats integer numbers as Java `Long`s. – Kayaman Oct 18 '19 at 07:57