My Final RDD looks like this
FinalRDD.collect()
Array[(Int, Seq[Iterable[Int]])] = Array((1,List(List(97), List(98), List(99), List(100))), (2,List(List(97, 98), List(97, 99), List(97, 101))), (3,List(List(97, 98, 99),List(99, 102, 103))))
I would like to write this RDD to a text file in the following format
('97'), ('98'), ('100')
('97', '98'), ('97', '99'), List(97, 101)
('97','98', '99'), ('97', '99', '101')
I found many websites suggesting PrintWriter class from java.io as one option to achieve this. Here is the code that I have tried.
val writer = new PrintWriter(new File(outputFName))
def writefunc(chunk : Seq[Iterable[Int]])
{
var n=chunk
print("inside write func")
for(i <- 0 until n.length)
{
writer.print("('"+n(i)+"')"+", ")
}
}
finalRDD.mapValues(list =>writefunc(list)).collect()
I ended up getting task serializble error shown below
finalRDD.mapValues(list =>writefunc(list)).collect()
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:340)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:330)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:156)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2294)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1.apply(PairRDDFunctions.scala:758)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1.apply(PairRDDFunctions.scala:757)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.PairRDDFunctions.mapValues(PairRDDFunctions.scala:757)
... 50 elided
Caused by: java.io.NotSerializableException: java.io.PrintWriter
Serialization stack:
- object not serializable (class: java.io.PrintWriter, value: java.io.PrintWriter@b0c0abe)
- field (class: $iw, name: writer, type: class java.io.PrintWriter)
- object (class $iw, $iw@31afbb30)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@672ca5ae)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@528ac6dd)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@b772a0e)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@7b11bb43)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@94c2342)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@2bacf377)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@718e1924)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@6d112a64)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@5747e8e4)
- field (class: $line411.$read, name: $iw, type: class $iw)
- object (class $line411.$read, $line411.$read@59a0616c)
- field (class: $iw, name: $line411$read, type: class $line411.$read)
- object (class $iw, $iw@a375f8f)
- field (class: $iw, name: $outer, type: class $iw)
- object (class $iw, $iw@4e3978ff)
- field (class: $anonfun$1, name: $outer, type: class $iw)
- object (class $anonfun$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:337)
... 59 more
I'm still in the phase of learning scala. Could someone suggest me, how to write "Seq[Iterable[Int]]" object to a text file