I am trying to fetch spark row size like this , by following this.
Converting to rdd gives a lot more issues, so I was trying to use toSeq and passing on to get object size.
private[spark] def getEventSize(row: ssql.Row): Long = {
ObjectSizeFetcher.getObjectSize(row.toSeq)
}
Though it seems to print the data, but throws a Null Pointer exception for the same object
oWrappedArray(1, 1, 2, 2, 2.0, Map(a -> 1), a, a, 0, 1, Map(1 -> 1), 1, 1, 1.0, 0.0, 0, 1, 1.0)
Exception
java.lang.NullPointerException:
at com.expediagroup.dataquality.polaris.batchprofiler.utils.ObjectSizeFetcher.getObjectSize(ObjectSizeFetcher.java:16)
I am using Instrumentation.getObjectSize to fetch the size of spark row
import java.lang.instrument.Instrumentation;
public class ObjectSizeFetcher {
private static Instrumentation instrumentation;
public static void premain(String args, Instrumentation inst) {
instrumentation = inst;
}
public static long getObjectSize(Object o) {
System.out.println("o" + o);
if(o==null)
return 0;
return instrumentation.getObjectSize(o);
}
}
Any help is appreciated