I am reading a table from a MySQL database in a spark project written in scala. It s my first week on it so I am really not so fit. When I am trying to run
val clusters = KMeans.train(parsedData, numClusters, numIterations)
I am getting an error for parsedData that says:"type mismatch; found : org.apache.spark.rdd.RDD[Map[String,Any]] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]"
My parsed data is created above like this:
val parsedData = dataframe_mysql.map(_.getValuesMap[Any](List("name", "event","execution","info"))).collect().foreach(println)
where dataframe_mysql is the whatever is returned from sqlcontext.read.format("jdbc").option(....) function.
How am I supposed to convert my unit to fit the requirements to pass it in the train function?
According to documentation I am supposed to use something like this:
data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
Am I supposed to transform my values to double? because when I try to run the command above my project will crash.
thank you!