I need to read several csv files and convert several columns from string to Double.
The code is like:
def f(s:String):Double = s.toDouble
def readonefile(path:String) = {
val data = for {
line <- sc.textFile( path )
arr = line.split(",").map(_.trim)
id = arr(33)
} yield {
val countings = ((9 to 14) map arr).toVector map f
id -> countings.toVector
}
data
}
If I write the toDouble
explicitly (e.g. function f
in the code), spark throws error java.io.IOException
or java.lang.ExceptionInInitializerError
.
However if I change countings
to
val countings = ((9 to 14) map arr).toVector map (_.toDouble)
Then everything works fine.
Is function f
serializable?
EDIT:
Some people says it is the same as Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects
But why doesn't it throw Task not serializable
exception?
Scala version 2.10
Spark version 1.3.1
Environment: yarn-client