I have years experience of Java 8 and its lambda. But I met a insane problem when I developed a hello-world-size Spark program.
Here I have a Java class, in which the Data annotation is from Lombok:
@Data
public class Person implements Serializable {
private String name;
private Long age;
}
And then I built a java list containing objects of Persion
class:
Person p1 = new Person("sb", 1L);
Person p2 = new Person("sth", null);
List<Person> list = new ArrayList<>(2);
list.add(p1);
list.add(p2);
so good so far. And then I tried to generate a Spark Dataset using the list:
SparkSession session = SparkSession.builder().master("local[1]").appName("SparkSqlApp").getOrCreate();
Encoder<Person> personEncoder = Encoders.bean(Person.class);
Dataset<Person> dataset1 = session.createDataset(list, personEncoder);
dataset1.foreach(new ForeachFunction<Person>() { // 1
@Override
public void call(Person person) throws Exception {
System.out.println(person);
}
});
dataset1.foreach((ForeachFunction<Person>) System.out::println); //2
Notice that, the block 1 is equivalent to block 2 in java and the block 2 is simplified from block 1 by IntelliJ IDEA. The only difference is block 2 is using lambda expression.
However, when I execute the program, block 1 ends well while block 2 run in exception:
What the... big earth and big universe? Why the JVM or Spark engine does things like this?!