Spark 2.0 mllib.linalg.SparseVector methods with DataFrames through UDF?

Question

I understand that Vectors and DataFrames are supposed to play nice in Spark 2.0+. I have a very simple case and weird looking error. I have a DataFrame with a column of SparseVectors that I can't apply any vector methods to. The error says it wants a vector as the input, however, I gave it a vector. Any idea what is going wrong here?

import org.apache.spark.sql.functions._
import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector}

// This is my dataFrame. One column, it is a sparse vector 
vecCol
org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [normFeatures: vector]

//It doesn't matter what vector method I use, it fails no matter what
val vecToArray = udf((v: Vector) => v.toArray)
vecCol.withColumn("withArray",vecToArray($"normFeatures"))
org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(normFeatures)' due to data type mismatch: argument 1 requires vector type, however, '`normFeatures`' is of vector type.;

Spark 2.0 mllib.linalg.SparseVector methods with DataFrames through UDF?

0 Answers0