Have a simple, maybe weird question: for the following code DAG is executed twice which is expected, because I'm calling action two times:
val input = sc.parallelize(List(1,2,3,4))
val result = input.map(x => {
println("!!! Input Map !!!")
errorLines.add(1)
(x,1)
})
//.reduceByKey(_+_)
println(result.count())
println(result.collect())
If I uncomment reduceByKey
line - DAG will be executed only once, although reduceByKey
is transformation and I'm calling actions two times.
Does that mean that Spark just doesn't always recompute DAG?