I use Hortonworks 2.6 with 5 nodes. I spark-submit
to YARN (with 16GB RAM and 4 cores).
I have a RDD transformation that runs fine in local
but not with yarn
master URL.
rdd1
has values like:
id name date
1 john 10/05/2001 (dd/mm/yyyy)
2 steve 11/06/2015
I'd like to change the date format from dd/mm/yyyy
to mm/dd/yy
, so I wrote a method transformations.transform
that I use in RDD.map
function as follows:
rdd2 = rdd1.map { rec => (rec.split(",")(0), transformations.transform(rec)) }
transformations.transform
method is as follows:
object transformations {
def transform(t: String): String = {
val msg = s">>> transformations.transform($t)"
println(msg)
msg
}
}
Actually the above code works fine in local but not in cluster. The method just returns an output as if the map
looked as follows:
rdd2 = rdd1.map { rec => (rec.split(",")(0), rec) }
rec
does not seem to be passed to transformations.transform
method.
I do use an action to trigger transformations.transform()
method but no luck.
val rdd3 = rdd2.count()
println(rdd3)
println
prints the count but does not call transformations.transform
method. Why?