0

I need to calculate the size of an RDD in Java.

In Scala, it was quite easy and I used the following code:

rdd.map(_.getBytes("UTF-8").length.toLong.reduce(_+_)

which gives the right size.

In Java, I found this:

SizeEstimator.estimate(rdd)

However, it returns the wrong size, off by a huge margin. How can I correctly estimate the RDD size in Java?

In this answer, How can I find the size of a RDD ,

rows.apply

does not work in Java, because we initialize rows as val in scala and rdd.collect() returns Object in Java. So not really applicable.

aran
  • 11
  • 1
  • 7

0 Answers0