I need to calculate the size of an RDD in Java.
In Scala, it was quite easy and I used the following code:
rdd.map(_.getBytes("UTF-8").length.toLong.reduce(_+_)
which gives the right size.
In Java, I found this:
SizeEstimator.estimate(rdd)
However, it returns the wrong size, off by a huge margin. How can I correctly estimate the RDD size in Java?
In this answer, How can I find the size of a RDD ,
rows.apply
does not work in Java, because we initialize rows as val in scala and rdd.collect() returns Object in Java. So not really applicable.