I'm trying to take the 25 top items of a JavaPairRDD
like this:
JavaPairRDD rdd = ...;
List<Tuple2<String, Long>> top25 = rdd.top(25, (t1, t2) -> {
if (!t1._2.equals(t2._2)) {
return -1 * Long.compare(t1._2, t2._2);
}
else {
return t1._1.compareTo(t2._1);
}
})
This is sorting based on first the value and if values are equal, then on the keys. When I run it, I get the following exception:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
I think the problem is that the inline lambda function playing the role of Comparator
is not serializable.
I've got two questions. First, assuming my assumption is correct, why the Comparator
needs to be serializable? And second, how to solve this problem?