I have my code written in Spark and Scala. Now I need to measure elapsed time of particular functions of the code.
Should I use spark.time
like this? But then how can I properly assign the value of df
?
val df = spark.time(myObject.retrieveData(spark, indices))
Or should I do it in this way?
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
val df = time{myObject.retrieveData(spark, indices)}
Update:
As recommended in comments, I run df.rdd.count
inside myObject.retrieveData
in order to materialise the DataFrame.