3

There is a matrix and I want to perform the dot product of it with a vector. Following is the Scala code:

val matrix = sc.parallelize(List(
  (("v1","v1"),2),(("v1","v2"),4),(("v1","v3"),1),(("v2","v2"),5),
  (("v2","v3"),1),(("v3","v3"),2)))

val vector = sc.parallelize(List(("v1",4),("v2",1),("v3",5)))

val dotproduct = matrix.flatMap{x => { 
  vector.flatMap { y => { 
    if(x._1._2 == y._1) Tuple2(x._1._1, x._2 * y._2)
  }}
}}.reduceByKey((_,_) => _+_)

But the following error occurred:

<console>:25: error: type mismatch;
found   : (String, Int)
required: TraversableOnce[?]
val dotproduct = matrix.flatMap{ x => { vector.flatMap { y => { if(x._1._2 == y._1) (x._1._1, x._2 * y._2) }}}}.reduceByKey((_,_) => _+_)
                                                                                         ^

I don't know if the nesting operation in RDD is OK. Does Spark MLlib provide any API to perform the dot product between matrix and vector?

zero323
  • 322,348
  • 103
  • 959
  • 935
victorming888
  • 121
  • 2
  • 5

2 Answers2

3

I don't know if the nesting operation in RDD is OK.

It is not OK. Spark doesn't support nested actions, transformations or distributed data structures.

Does Spark MLlib provide any API to perform the dot product between matrix and vector?

RowMatrix provides multiply method which accepts a local matrix. It should work just fine in your case.

import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}

val idx = "^v([0-9]+)$".r

val rdd = sc.parallelize(List(
  (("v1", "v1"), 2), (("v1", "v2"), 4),
  (("v1", "v3"), 1), (("v2", "v2"), 5),
  (("v2", "v3"), 1), (("v3", "v3"), 2)
))

val mat = new CoordinateMatrix(rdd.map { case ((idx(i), idx(j)), v) => 
  MatrixEntry(i.toLong - 1, j.toLong - 1, v.toDouble)
}).toIndexedRowMatrix

val vector =  Matrices.dense(3, 1, Array(4.0, 1.0, 5.0))
mat.multiply(vector).rows

If vector is to large to be handled in memory you can use block matrices. See Matrix Multiplication in Apache Spark

Regarding your code you can for example do something like this:

matrix
  .map{case ((i, j), v) => (j, (i, v))}
  .join(vector)
  .values
  .map{case ((i, v1), v2) => (i, v1 * v2)}
  .reduceByKey(_ + _)

or with local "vector" (optionally broadcasted):

val vector = Map(("v1" -> 4), ("v2" -> 1), ("v3" -> 5)).withDefault(_ => 0)

matrix.map{case ((i, j), v) => (i, v * vector(j))}.reduceByKey(_ + _)
Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935
2

Assuming that by dot product you mean just the ordinary matrix-vector multiplication, you could use the multiply method from the mllib.linalg package

val mlMat=Matrices.dense(3,2,matrix.collect().map(_._2.toDouble)).transpose
val mlVect=Vectors.dense(vector.collect().map(_._2.toDouble))
mlMat.multiply(mlVect)
//org.apache.spark.mllib.linalg.DenseVector = [17.0,31.0]
Christian Hirsch
  • 1,996
  • 12
  • 16