I want to time my Spark program execution speed but due to laziness it's quite difficult. Let's take into account this (meaningless) code here:
var graph = GraphLoader.edgeListFile(context, args(0))
val graph_degs = graph.outerJoinVertices(graph.degrees).triplets.cache
/* I'd need to start the timer here */
val t1 = System.currentTimeMillis
val edges = graph_degs.flatMap(trip => { /* do something*/ })
.union(graph_degs)
val count = edges.count
val t2 = System.currentTimeMillis
/* I'd need to stop the timer here */
println("It took " + t2-t1 + " to count " + count)
The thing is, transformations are lazily so nothing gets evaluated before val count = edges.count
line. But according to my point of view t1
gets a value despite the code above hasn't a value... the code above t1
gets evaluated after the timer started despite the position in the code. That's a problem...
In Spark Web UI I can't find anything interesting about it since I need the time spent after that specific line of code. Do you think is there a easy solution to see when a group of transformation gets evaluated for real?