I want to multiply two large matrices using Spark and Scala and save output.I use the following code
val rows=file1.coalesce(1,false).map(x=>{
val line=x.split(delimiter).map(_.toDouble)
Vectors.sparse(line.length,
line.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
})
val rmat = new RowMatrix(rows)
val dm=file2.coalesce(1,false).map(x=>{
val line=x.split(delimiter).map(_.toDouble)
Vectors.dense(line)
})
val ma = dm.map(_.toArray).take(dm.count.toInt)
val localMat = Matrices.dense( dm.count.toInt,
dm.take(1)(0).size,
transpose(ma).flatten)
// Multiply two matrices
val s=rmat.multiply(localMat).rows
s.map(x=>x.toArray.mkString(delimiter)).saveAsTextFile(OutputPath)
}
def transpose(m: Array[Array[Double]]): Array[Array[Double]] = {
(for {
c <- m(0).indices
} yield m.map(_(c)) ).toArray
}
When I save file it takes more time and output file has very large in size.what is the optimized way to multiply two large files and save the output?